theislab / sfaira

data and model repository for single-cell data
https://sfaira.readthedocs.io
BSD 3-Clause "New" or "Revised" License
135 stars 11 forks source link

AttributeError with streamline_features #734

Closed yiqisu closed 1 year ago

yiqisu commented 1 year ago

Hi, as shown below, the streamline_features-related code from the data_loaders tutorial works well when I cloned the main branch.

ds.streamline_features(match_to_release="104", subset_genes_to_type="protein_coding")  # Choose a reference genome and subset to only protein-coding genes
ds.streamline_metadata(schema="sfaira")  # make sure the metadata annotation of all datasets are in line with the sfaira schema, so they can be cleanly concatenated in the next step
print(ds.adata) # Use the adata object for your analysis or modelling

While following the store_cart_generator tutorial, I came across KeyError: 'ethnicity' with the following code:

cache_path = os.path.join(".", "data")
dsg = sfaira.data.dataloaders.databases.DatasetSuperGroupDatabases(data_path=cache_path, cache_metadata=True)

To avoid the ethnicity issue I switched to the dev branch.

However, another error AttributeError: 'Dataset' object has no attribute 'streamline_features' appeared when I tried to create store with the code below following the store_cart_generator tutorial:

for k, ds in dsg.datasets.items():
    if ds.adata is None:
        ds.load(load_raw=False, allow_caching=True)
    ds.streamline_features(
        remove_gene_version=True,
        match_to_release={"Homo sapiens": "104", "Mus musculus": "104"},
        subset_genes_to_type="protein_coding"
    )
    ds.streamline_metadata(
        schema="sfaira", clean_obs=True, clean_var=True, clean_uns=True, clean_obs_names=True
    )
    ds.write_distributed_store(
        dir_cache=store_path, 
        store_format='dao', 
        dense=True, 
        chunks=128,
        compression_kwargs={"compressor": "default", "overwrite": True, "order": "C"}
    )

Similarly, I got the error AttributeError: 'Universe' object has no attribute 'streamline_features' with the aforementioned code from the data_loaders tutorial, which I did not have before with the main branch.

I'd appreciate if you could kindly help with this issue.

Thanks, Yiqi

le-ander commented 1 year ago

Hi! On the dev branch, please use:

ds.streamline_var(match_to_release={"Homo sapiens": "104", "Mus musculus": "104"}, subset_genes_to_type="protein_coding")
ds.streamline_obs_uns()
yiqisu commented 1 year ago

Thank you!