linnarsson-lab / loompy

Python implementation of the Loom file format - http://loompy.org
BSD 2-Clause "Simplified" License
137 stars 36 forks source link

Missing 'Vega translation ID' #168

Open juhjeong opened 2 years ago

juhjeong commented 2 years ago

Hello, I am trying to build the mouse kallisto index following the tutorial in loompy github. Most parts work well so far, but I got stucked with the Mouse GRCm38 annotation download part... In the tutorial, the following statement is illustrated: "...The file should contain the following columns in the header: Gene stable ID Gene stable ID version Transcript stable ID Transcript stable ID version UCSC Stable ID Vega translation ID CCDS ID.." However, the BioMart page does not seem to provide Vega translation ID anymore. How can I solve this problem?

Thank you so much, Juhee

Matthew1309 commented 1 year ago

I also ran into this issue, how did you overcome it?

mughetta commented 1 year ago

I was also struggling with this issue, but I solved it by downloading the file without the Vega translation ID and then manually changing the mouse_build.py file on my computer. The goal of manually editing the mouse_build.py file is to remove when the Vega IDs are incorporated into other files. I did this with the following changes:

On line 71: I removed the "VegaID" being added to the gencode.vM23.metadata.tab.

On line 113: I removed the following, which is adding information to the gencode.vM23.primary_assembly.annotation.gtf: enseID2mart[enseID][5] if enseID in enseID2mart else "", # VEGA id

On line 116: I changed the index for the CCDS IDs being added from 6 to 5 as follows since the Vega IDs are no longer being added before it: enseID2mart[enseID][5] if enseID in enseID2mart else "", # CCDS id

Hope this helps for anyone else having this issue.