Open samwindels opened 5 years ago
@samwindels, just a comment on how the parsing is working, ENSG00000285114
maps to 56169
. If it mapped to 561690
you'd get a float that looks like: 561690.0
.
Here's an example:
In [5]: dataset.query(attributes=['ensembl_gene_id', 'entrezgene'], filters={'link_ensembl_gene_id':
...: ['ENSG00000099725','ENSG00000185115', 'ENSG00000285363']})
Out[5]:
Gene stable ID NCBI gene ID
0 ENSG00000099725 5616.0
1 ENSG00000185115 56160.0
2 ENSG00000285363 NaN
As a work around, you should be able to get to the correct identifiers (as strings) with:
result["entrezgene"].apply(lambda x: "{:.0f}".format(x))
Hi,
I perform the following call to map two genes to entrez gene ids.
dataset = (server.marts['ENSEMBL_MART_ENSEMBL'].datasets['hsapiens_gene_ensembl']) dataset.query(attributes=['ensembl_gene_id', 'entrezgene'], filters={'link_ensembl_gene_id': ['ENSG00000285363','ENSG00000285114']})
As a result I get:
Gene stable ID NCBI gene ID 0 ENSG00000285114 56169.0 1 ENSG00000285363 NaN
What happens is that, because ENSG00000285363 does not have a known mapping in NCBI, the entire column get's listed as floats. This is troublesome as now I can't know if ENSG00000285114 maps to 56169 or 561690.
Regards,
Sam