monarch-initiative / monarch-gene-mapping

Code for mapping source namespaces to preffered namespacing
2 stars 0 forks source link

switch to sssom converter #36

Closed glass-ships closed 9 months ago

glass-ships commented 9 months ago

Closes #35

glass-ships commented 9 months ago

There are in fact still some subject_id's missing, potentially other values i may have missed (it's almost a million rows): image

matentzn commented 9 months ago

@glass-ships can you prioritise finishing this, because right now the gene mappings on data.mi.org are totally broken;

I think we should not upload anything to data.mi.org unless it passes first through mapping commons..

glass-ships commented 9 months ago

can you prioritise finishing this, because right now the gene mappings on data.mi.org are totally broken;

unfortunately I have no idea how to fix this without knowing why there are still empty column values using the newly suggested converter

I think we should not upload anything to data.mi.org unless it passes first through mapping commons..

mapping commons gets the original file from data.mi.org. this can eventually be changed once gene mapping gets migrated to mapping commons, but because it currently requires the download of an 11gb file from uniprot in order to subset and create the gene mapping file. i currently have no estimated timeline for how long it will take to migrate that code into mapping commons or come up with a solution for the uniprot problem (possibly using their api but i will require @kevinschaper's help for that)

matentzn commented 9 months ago

unfortunately I have no idea how to fix this without knowing why there are still empty column values using the newly suggested converter

I provided a skeleton for the solution in my last commit. All you need to do is add the missing prefixes to the prefix_map!

glass-ships commented 9 months ago

i'm not sure what the urls on the other side of each mapping should be - suggestions for where to look?

matentzn commented 9 months ago

For now, just to make it work: use something monarchy like data.mi.org. We will correct this later. So https://data.monarchinitive.org/sources/X/, where X is the prefix