snap-stanford / UCE

UCE is a zero-shot foundation model for single-cell gene expression data
MIT License
120 stars 15 forks source link

IMA sample issue #24

Closed v-mahughes closed 4 months ago

v-mahughes commented 4 months ago

I was able to download both files onto my system (green_monkey.h5ad and IMA_sample.h5ad). I could successfully read in green monkey data with scanpys read_h5ad function. However, when I try to read in the ima sample data with scanpy.read_h5ad I get the following error: OSError: Unable to open file (file signature not found)

I have checked to ensure the file size matches the file size on the google drive.

Yanay1 commented 4 months ago

That error might indicate that the file did not finish downloading properly.

v-mahughes commented 4 months ago

was able to download fully. In 2d i notice that not all coarse cell type annotations in the IMA lymph sample are included in the figure. Were all coarse cell types of the lymph data included in training / testing? or was the dataset restricted to those shown in the figure

v-mahughes commented 4 months ago

Also, the ima sample data does Not contain the UCE embeddings (no 'X_uce' layer) , but the green monkey data does.

v-mahughes commented 4 months ago

Or is the adata.X layer the uce embeddings? seems to be of the correct shape (1280)

Yanay1 commented 4 months ago

For that file .X is the UCE embeddings

v-mahughes commented 4 months ago

would it be possible for you to provide the filtered, gene epxression matrix for this dataset as well ?

Yanay1 commented 4 months ago

Unfortunately we don't have that for this file. Since it's collected from so many datasets, and from many different species, it would be difficult to create a corresponding gene expression dataset.