mims-harvard / PINNACLE

Contextual AI models for single-cell protein biology
https://zitniklab.hms.harvard.edu/projects/PINNACLE
MIT License
72 stars 16 forks source link

Cell type order file #1

Open RemyLau opened 1 year ago

RemyLau commented 1 year ago

Hi @michellemli! Love your work. Is there any way I can find out the corresponding cell type in the provided embeddings on figshare? Or is it safe to assume that it's just the alphabetically sorted list?

RemyLau commented 1 year ago

After some investigation, this is my current belief: they are ordered as the "default" file ordering in the filesystem (not sorted)...

In the reader function, glob was used to iterate over the contexts; this order gets propagated into the ppi_layers dictionary. Meanwhile, glob returns file names in unsorted default order (here).

Finally, since the GAT modules are constructed in the order of the default ppi order (here), and the ppi data follows the same order as ppi_layers (here), we can conclude that the generated embeddings follow the "default" system ordering of the context ppi files.

I'm not fully confident that this system ordering is persistent across systems, e.g., will it be the same if I downloaded them on my system? But for now, I'm going to assume the answer is yes..

Although in the same StackOverflow thread, someone pointed out the the ordering is generally not guaranteed:

glob.glob() is a wrapper around os.listdir() so the underlaying OS is in charge for delivering the data. In general: you can not make an assumption on the ordering here. The basic assumption is: no ordering. If you need some sorting: sort on the application level.

Sophon-0 commented 3 months ago

is it possible to provide a script/function to get the embedding taking as input the cell type and the tissue ?