kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
222 stars 24 forks source link

AnnData subset function: out cannot be None. #251

Closed beyondpie closed 6 months ago

beyondpie commented 6 months ago

Hi Kai,

If I set None to out parameter of AnnData subset, I will have error. In the meanwhile, what's the meaning of ~backend~ in that function? Can I use "r" to save memory and "r+" to read subset into memory?

Thanks! Songpeng

kaizhang commented 6 months ago

Currently the only backend is hdf5. So setting this parameter has no effect. subset will subset the AnnData inplace. If your AnnData is in read-only mode, setting out=None will cause an error. Do you want to save the AnnData subset in memory?

beyondpie commented 6 months ago

Hi Kai,

Thanks for the reply! In my usage, raw data is quite large, so I typically load it using backed as 'r'. Then I can consider subset of it, and load that subset into memory for downstream analysis. I will use pipeline tool to run simultaneously different part of Anndata in parallel.

  1. I think subset can be inplace or not inplace. People may want to use part of the data without influencing the raw data.

  2. I mainly use subset of the data in memory. And may save that part into another file, but this is not that common. Currently, I have to save the subset of the data somewhere and load it into memory later.

  3. I usually get confused about 'backed' and 'backend'. Sorry for this.

Thanks! Songpeng

kaizhang commented 6 months ago

Hi Kai,

Thanks for the reply! In my usage, raw data is quite large, so I typically load it using backed as 'r'. Then I can consider subset of it, and load that subset into memory for downstream analysis. I will use pipeline tool to run simultaneously different part of Anndata in parallel.

  1. I think subset can be inplace or not inplace. People may want to use part of the data without influencing the raw data.
  2. I mainly use subset of the data in memory. And may save that part into another file, but this is not that common. Currently, I have to save the subset of the data somewhere and load it into memory later.
  3. I usually get confused about 'backed' and 'backend'. Sorry for this.

Thanks! Songpeng

That makes sense! I'll modify subset to add this functionality.

kaizhang commented 6 months ago

Implemented: https://kzhang.org/SnapATAC2/version/dev/api/_autosummary/snapatac2.AnnData.subset.html#snapatac2.AnnData.subset

A nightly release will be automatically built and released tomorrow.

beyondpie commented 6 months ago

@kaizhang Hi Kai,

I update SnapATAC2 to 2.6.

When I use subset function, I have the error below.

thread '' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anndata-0.3.1/src/container/base.rs:606:88: called Result::unwrap() on an Err value: H5Ldelete(): unable to delete link: no write intent on file note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Here is how I run a typical subset command:

sub_ann = ann_fm.subset(
    obs_indices = cellmeta.exp.isin([ee]).to_list(),
    out = None,
)

And ann_fm here is loaded using snapatac2.read with backed as r.

Do you have any suggestions? Thanks! Songpeng

kaizhang commented 6 months ago

You need to add inplace=False. This is necessary as "out" is now used to indicate whether the new AnnData should be backed or not.

beyondpie commented 6 months ago

@kaizhang https://kzhang.org/SnapATAC2/api/_autosummary/snapatac2.AnnData.subset.html#snapatac2.AnnData.subset There is no inplace parameter? Also, I notice that some API has the link to source code (but link might be broken), some does not. Songpeng

kaizhang commented 6 months ago

This feature exits only in the nightly version: https://kzhang.org/SnapATAC2/version/dev/api/_autosummary/snapatac2.AnnData.subset.html

beyondpie commented 6 months ago

@kaizhang Oh, I see. I thought it was already in the latest stable version. Thanks, Kai. Songpeng

philmar1 commented 5 months ago

Hi !

I passed the parameter out="src/myfolder/myfile.h5ad" but what I observe is that I got a new file "myfile.h5ad.h5ad" (2 times ".h5ad") created in src instead. Did someone has a proper working out argument ?

Thanks a lot