theislab / sfaira

data and model repository for single-cell data
https://sfaira.readthedocs.io
BSD 3-Clause "New" or "Revised" License
134 stars 11 forks source link

Empty cache folder #736

Closed yiqisu closed 11 months ago

yiqisu commented 1 year ago

Hi,

I aim to obtain the h5ad files. However, when I ran the following code, I was only able to get the raw folder. Could you please provide any guidance? Thanks!

Best, Yiqi

import os
import anndata
import sfaira

# Set this path to local sfaira data repository
basedir = '.'
datadir = os.path.join(basedir, 'raw')
metadir = os.path.join(basedir, 'meta')
cachedir = os.path.join(basedir, "data")

ds = sfaira.data.Universe(data_path=datadir, meta_path=metadir, cache_path=cachedir)  # This links all data sets available
ds.subset(key="organism", values=["Mus musculus"])  # subset to mouse datasets
ds.subset(key="organ", values=["lung", "liver"])  # subset further to liver and lung data sets
ds.download() # Download the selected datasets to your local sfaira data repository
davidsebfischer commented 1 year ago

Hey Yiqi, is the problem that you didnt find the raw folder after downloading or that downloads did not execute? Looking at your other post, internet connectivity might be a problem?

yiqisu commented 1 year ago

Hi David, thanks so much for your rapid response as always! I found both the raw and cache folders after downloading. My problem is that the cache folder is empty while I expect the h5ad files there.

davidsebfischer commented 1 year ago

Ok, so I think what might be going on, assuming you just executed this snippet - download() loads files into the raw/ folder. cache/ is used to cache loading from your hard disk into memory, this folder is populated with intermediate files (h5ads) that are created from raw/ files that can be very slow to load. So you would only see files in cache/ once you loaded datasets into memory! When you download cellxgene data with sfaira, they will already be h5ads, but because in that case h5ad is the raw format, those will be in raw/!

yiqisu commented 1 year ago

Yes, it's true that most files I had in the raw folder are already h5ads. But sometimes I got compressed and acc.cgi files. Maybe I can try to load these datasets completely into RAM. I will keep you posted! Thanks!