scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
577 stars 154 forks source link

Add parameter for more resilient `concat_on_disk` #1602

Open DingWB opened 2 months ago

DingWB commented 2 months ago

Please make sure these conditions are met

Report

When I concat two adata using the following code:

anndata.experimental.concat_on_disk(adata_path_list, raw_adata_path)

I got an error:

Traceback (most recent call last):
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/bin/pym3c", line 8, in <module>
    sys.exit(main())
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/pym3c/__init__.py", line 47, in main
    fire.Fire({
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/pym3c/adata.py", line 103, in merge_adatas
    anndata.experimental.concat_on_disk(adata_path_list, raw_adata_path)
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 650, in concat_on_disk
    _write_concat_mappings(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 258, in _write_concat_mappings
    _write_concat_sequence(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 354, in _write_concat_sequence
    _write_concat_arrays(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 310, in _write_concat_arrays
    write_concat_dense(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/anndata/experimental/merge.py", line 176, in write_concat_dense
    res = da.concatenate(
  File "/anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/dask/array/core.py", line 4293, in concatenate
    raise ValueError("Shapes do not align: %s", [x.shape for x in seq2])
ValueError: ('Shapes do not align: %s', [(369626, 50), (135426, 63)])

Because the obsm['X_pca'] has a different shape in two adatas, but I only need to contact the X, I don't need obsm or varm. Could you please add a parameter to let me skip obsm (or varm) and only concat X?

Versions

0.10.8
flying-sheep commented 2 months ago

Hmm, since this function exists explicitly to handle big files, I think it wouldn’t be fair to say “just make new files without these parts”, so this is a reasonable request.

Regarding API: We already have the pairwise option, but its goal is to discourage merging semantically unwise-to-merge graphs instead of topologically unmergeable arrays, so we shouldn’t add a parameter for each attr.

Therefore I think we have two options:

[^1]: I’d think signature would be

Parameters (all except `error` are keyword-only):

- `error`: The error instance
- `attr`: `AnnData` attribute we failed to concat
- `key`: String key if `attr` is e.g. `obsm` or `layers`, `None` if `attr` is e.g. `X` or `obs`
- maybe more? e.g. a list of elements to-be-concatenated?

Returns:

- a value if the user wants to default to something/handle something themselves
- `None` if we don’t want to set the thing
- raise an error (e.g. the original one) if the user wants to forward the error.
DingWB commented 2 months ago

Hi @flying-sheep ,

I think you misunderstood my meaning. I have two adata files, both have obsm['X_pca'], but the shapes are different: (369626, 50) and (135426, 63). So I got an error.

Is there a way to skip the concat of obsm?

flying-sheep commented 2 months ago

I understood that perfectly.

No, there isn’t, that’s why I’m brainstorming solutions.

DingWB commented 2 months ago

OK. Thanks.