Open Egiob opened 5 months ago
@Egiob Hello, have you solved this problem now?
could it be that that the mapping is obtained like this (so that the token 10723 is ENSG00000000003)?
h5ad = sc.read_h5ad("nicheformer/data/model_means/model.h5ad")
h5ad.X
(0, 10723) 1.0
(0, 12184) 4.0
(0, 5297) 1.0
(0, 17537) 1.0
(0, 6145) 1.0
(0, 13799) 1.0
(0, 3204) 1.0
(0, 19265) 1.0
h5ad.X.shape
(1, 20310)
h5ad.var
Empty DataFrame
Columns: []
Index
[20310 rows x 0 columns]
Oh based on the ipnbs it looks even simpler and we can just the the gene ordering from the model.h5ad
:
#Loading model with right gene ordering
model = sc.read_h5ad(
f"{BASE_PATH}/model.h5ad"
)
...
#Concatenation
#Next we concatenate the model and the dissociated object to ensure they are in the same order. This ensures we have the same gene #ordering in the object.
adata = ad.concat([model, dissociated], join='outer', axis=0)
Hello, I understand that Nicheformer operates on a vocabulary of 20,310 genes. But I can't find in this repo the map that would allow to convert let's say an ensembl ID, or a gene name, to an id (i.e. a token) in your vocabulary.
Could you provide this gene map please? Or indicate how you constructed it?
Thank you so much.