theislab / graphcompass

GraphCompass: Graph Comparison Tools for Differential Analyses in Spatial Systems
MIT License
32 stars 0 forks source link

filtration curve error: igraph cannot get edge ID #45

Closed mihem closed 8 months ago

mihem commented 8 months ago

@merelkuijs Thank you for this great package, which to my knowledge with the focus on differences in spatial heterogeneity across conditions!

When installing the package I found that it did not work with Python 3.12 (due to Pygeos), whiel Python 3.10 in a conda environment worked fine.

The tutorials are helpful (although it would be convenient if the preprocessed object would be available somewhere; i find this helpful to understand the format the data has to have and would e easier than downloading all raw data and running the preprocessing notebook). However, i could not run gc.tl.filtration_curves.compare_conditions.

Following the tutorial with my own Xenium data, everything works fine until:

gc.tl.filtration_curves.compare_conditions(
    adata=adata,
    library_key=library_key,
    cluster_key=cluster_key,
    condition_key=condition_key,
    compute_spatial_graphs=False
    )

exits with:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mischko/.local/lib/python3.10/site-packages/graphcompass/tl/_filtration_curves.py", line 119, in compare_conditions
    edge_id = graph.get_eid(i, j)
igraph._igraph.InternalError: Error at src/graph/type_indexededgelist.c:1414: Cannot get edge ID, no such edge. -- Invalid value

Thanks for you help!

mkuijs commented 8 months ago

Dear mihem,

Thanks for the issue! To address it, we will:

  1. Update the Python dependency to require >=3.9, <3.12
  2. Upload the pre-processed datasets

Setting compute_spatial_graphs=False implies that you have computed the spatial graph before using Squidpy's squidpy.gr.spatial_neighbors or one of our other compare_conditions functions. Is that correct? How are the spatial coords stored in your adata object?

mihem commented 8 months ago

@mkuijs Thanks!

I closely followed your tutorial https://github.com/theislab/graphcompass/blob/main/notebooks/tutorials/MIBITOF_breast_cancer.ipynb

So ran in advance:

gc.tl.wlkernel.compare_conditions(
   adata=adata,
   library_key=library_key,
   cluster_key=cluster_key,
   compute_spatial_graphs=True,
   kwargs_spatial_neighbors={
        'coord_type': 'generic',
        'delaunay': True,  
  }  
)

My spatial coords are stored in adata.obsm.["spatial"] I have Xenium data. I followed squidpy's Xenium tutorial to create the adata object https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_xenium.html

adata = sc.read_10x_h5(filename=f"xenium/raw/{directory}/cell_feature_matrix.h5")
df = pd.read_csv(f"xenium/raw/{directory}/cells.csv.gz")
df.set_index(adata.obs_names, inplace=True)
adata.obs = df.copy()
adata.obsm["spatial"] = adata.obs[["x_centroid", "y_centroid"]].copy().to_numpy()

gc.tl.wlkernel.compare_conditions, gc.pl.wlkernel.compare_conditions , gc.tl.distance.compare_conditions and gc.pl.distance.compare_conditions work fine, only gc.pl.filtration_curves.compare_conditions does not work.

Thanks!

merelkuijs commented 8 months ago

Looks good! If available, could you share the entire stack trace with me? I'm guessing it has something to do with the edge weight computation, but for debugging it would be helpful to understand which of my functions is throwing the error.

EDIT: Never mind, I see the line number. I'll get back to you.

merelkuijs commented 8 months ago

One possible explanation is that the edge weight matrix contains unexpected values, such as NaNs. The way to check this is to run:

from graphcompass.tl._filtration_curves import _compute_edge_weights

# assumes you have computed the spatial graph before
adj_matrix = _compute_edge_weights(gene_expression_matrix=adata.X, adjacency_matrix=adata.obsp['spatial_connectivities'])
weights = adj_matrix.tocoo()

has_nans = np.isnan(weights.data).any()
print(has_nans)

In case this is true, could you tell me the dtype of your adata.X?

merelkuijs commented 8 months ago

An alternative explanation is that some of your edge weights are 0. That happens when there is no difference between the gene expression matrices of neighbouring nodes. This is pretty unlikely, but could happen when we didn't measure any gene expression in two neighbouring cells.

mihem commented 8 months ago

Yes, there are zeros in the dataset.

# assumes you have computed the spatial graph before
adj_matrix = _compute_edge_weights(gene_expression_matrix=adata.X, adjacency_matrix=adata.obsp['spatial_connectivities'])
weights = adj_matrix.tocoo()

# check if there are any nans
has_nans = np.isnan(weights.data).any()
print(has_nans)

False

# check if there are any 0
(weights.data == 0).any()
has_zeros = np.isclose(weights.data, 0).any()
print(has_zeros)

True

Maybe this is related to the fact that Xenium only measures a rather small gene panel (100 in our case and not many transcripts per cell ~10 depnedng on the sample), but at very high resolution. Any way to solve that?

merelkuijs commented 8 months ago

I think it would make sense to remove edges if the edge weight is zero, since I expect this to occur pretty much only if we didn't measure any expression in two neighbouring cells and that sounds to me like a technical artifact rather than a biological result.

Let me know what you think of that solution, then I will push a bug fix.

mihem commented 8 months ago

Hm i am not sure I can judge whether this is biologically the best solution, but it's definitely better than no result. Probably should print a warning that edges will be removed.

But yes, would appreciate a fix. Thanks

merelkuijs commented 8 months ago

I have pushed a fix. We will update the package after the weekend.

mihem commented 8 months ago

Thanks. I installed the new version via github and indeed then the function successfully finishes. However, after running this function I can no longer save the object

adata = ad.read_h5ad("objects/graphcompass_adata.h5ad")

fails with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_core/anndata.py", line 2017, in write_h5ad
    write_h5ad(
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad
    write_elem(f, "uns", dict(adata.uns), dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 359, in write_elem
    Writer(_REGISTRY).write_elem(store, k, elem, dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/utils.py", line 243, in func_wrapper
    return func(*args, **kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 309, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 57, in wrapper
    result = func(g, k, *args, **kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/methods.py", line 312, in write_mapping
    _writer.write_elem(g, sub_k, sub_v, dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/utils.py", line 243, in func_wrapper
    return func(*args, **kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 309, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 57, in wrapper
    result = func(g, k, *args, **kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/methods.py", line 312, in write_mapping
    _writer.write_elem(g, sub_k, sub_v, dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/utils.py", line 243, in func_wrapper
    return func(*args, **kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 309, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 57, in wrapper
    result = func(g, k, *args, **kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/methods.py", line 323, in write_list
    _writer.write_elem(f, k, np.array(elem), dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/utils.py", line 243, in func_wrapper
    return func(*args, **kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 309, in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 57, in wrapper
    result = func(g, k, *args, **kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/anndata/_io/specs/methods.py", line 414, in write_vlen_string_array
    f.create_dataset(k, data=elem.astype(str_dtype), dtype=str_dtype, **dataset_kwargs)
  File "/home/mischko/.local/lib/python3.10/site-packages/h5py/_hl/group.py", line 161, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/mischko/.local/lib/python3.10/site-packages/h5py/_hl/dataset.py", line 159, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
  File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
  File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
  File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen

I checked again that my object could be saved in the above way before running gc.tl.filtration_curves.compare_conditions.

Also the x axis is not nicely visualized in my plots:

image

Thank you!

merelkuijs commented 8 months ago

Thanks for your commitment to helping us improve the package!

I'm modifying the plotting function so you'll be able to enter an x-axis limit. I'd remove the rightmost value from your plot because the change that happens is very minimal. You will be able to do this by calling:

gc.pl._filtration_curves.compare_conditions(
        adata,
        node_labels=["celltype1", "celltype2"],
        right=50  # change as needed
)

I'll push the bug fix, but we'll probably need to repackage again. I'm still looking into the saving issue.

merelkuijs commented 8 months ago

I was able to replicate the writing issue and resolved it by saving the collection of filtration curve dataframes (one per sample) as a dict rather than a list. I've pushed the bug fix -- it should become available soon.

merelkuijs commented 8 months ago

graphcompass 0.2.2 is now online. Let me know if there are any further issues!

mihem commented 8 months ago

Thanks for your prompt response and fix. I can confirm this fixes the saving issue and the plotting issue.