salzman-lab / SICILIAN

GNU General Public License v2.0
19 stars 11 forks source link

create annotator from 10x references genes.gtf.gz #21

Open leezx opened 1 year ago

leezx commented 1 year ago

encounter an error when creating annotator from 10x gtf file.

Modifying the get_exon_number function works for me.

def get_exon_number(row):
  if "exon_number" in row["attribute"]:
    # return int(row["attribute"].split("exon_number")[-1].split("\"")[1].split(";")[0])
    return int(row["attribute"].split(";")[8].split(" ")[-1])
wook2014 commented 1 year ago

I encounter the same error when creating annotator from gtf file. ValueError: invalid literal for int() with base 10: 'ENSE00002234944.1' It seemed that the function here want to get "exon_number" here but actually it turns for "exon_id". I find the reason is that my gtf file don't have the quotation mark in "exon_number" field and here is one solution: return int(row["attribute"].split("exon_number")[-1].split(";")[0].strip()) it returns the "exon_number" instead of "exon_id"