Closed bschilder closed 3 weeks ago
The IMA has data that is not from cell x gene so that wouldn't be possible.
For the cell x gene data, I would recommend using their models api: https://cellxgene.cziscience.com/census-models which lets you query UCE embeddings and cell x gene metadata for all cells on cell x gene (a superset of the UCE human and mouse training data)*
The api lets you make really nice and complex queries.
*there are a few additional human and mouse datasets in the training data that might not be from cell x gene.
The IMA has data that is not from cell x gene so that wouldn't be possible.
If see. Even so, could you not just provide the metadata for the CELLxGENE subset set NaN for the non-CELLxGENE datasets? My impression was the preprint was that the vast majority of the IMA was from CELLxGENE.
For the cell x gene data, I would recommend using their models api: https://cellxgene.cziscience.com/census-models which lets you query UCE embeddings and cell x gene metadata for all cells on cell x gene (a superset of the UCE human and mouse training data)*
The api lets you make really nice and complex queries.
*there are a few additional human and mouse datasets in the training data that might not be from cell x gene.
This is great to know, thanks! I'll check this out in tandem.
@Yanay1 would you mind reopening this issue? I don't think the fact that IMA has a few non-CELLxGENE Census datasets is a reason to not include the ontology IDs for any of the cells.
Thanks!
One of the biggest advantages of CELLXGENE Census is the fact that they've mapped all of the cell types, tissues, species, etc to common ontology terms (eg Cell Ontology, UBERON). This is super helpful for systematic evaluations, for example, of ontology-based distances vs embeddings-based distances. It also makes it much easier to compare with new sc datasets.
However, I've noticed that the subsampled IMA dataset doesn't seem to have these IDs.
Reprex
Requests
The essentials
Would it be possible for someone to add these to the IMA object, or at least provide a mapping file for the following fields :
The nice-to-haves
There's a lot more fields that CELLxGENE provides, but I think these are some of the more essential. Examples of other fields you may want to consider adding that could also be helpful.
(Randomly selected dataset: https://cellxgene.cziscience.com/collections/d2684035-a36e-458e-96af-8e37930bfdf6)