mortazavilab / TALON

Technology agnostic long read analysis pipeline for transcriptomes
MIT License
134 stars 31 forks source link

Annotation name not found #112

Open rkyger-git opened 1 year ago

rkyger-git commented 1 year ago

Hello fairliereese,

I ran into the same issue in this closed thread (https://github.com/mortazavilab/TALON/issues/91), when trying to run either talon_filter_transcripts or talon_abundance. Neither my .db file, nor the output files from the previous step ("Running TALON"), are empty. Do you know what might be causing the problem?

Originally posted by @rkyger-git in https://github.com/mortazavilab/TALON/issues/91#issuecomment-1306407923

fairliereese commented 1 year ago

Could you please provide the calls you made to TALON and the filtering / abundance utilities that caused this error? Thanks!

rkyger-git commented 1 year ago

Yes, the code used was:

talon_initialize_database --f A100.gtf --g a100 --a a100_rna --o A100

talon --t 64 --f config.talon --db A100.db --build a100 --o A100_out

talon_filter_transcripts --db A100.db -a a100_rna --o A100_talon_filt_trans

fairliereese commented 1 year ago

Sorry about the wait, could you please run this python code on your TALON database and tell me what the output is? I'd love to get to the bottom of this bug which seems to be common:

import sqlite3

database = <your talon database>

conn = sqlite3.connect(database)
cursor = conn.cursor()

cursor.execute("SELECT DISTINCT annot_name FROM gene_annotations")
annotations = [str(x[0]) for x in cursor.fetchall()]
conn.close()

print(annotations)
rkyger-git commented 1 year ago

Ok, I ran the code, the output I get is: ['TALON']

fairliereese commented 1 year ago

Ok so I ran your talon_initialize_database command with a GTF I had lying around, and directly after running that the output, the annotations that are in the database are ['a100_rna']. This leads me to believe something went wrong with the database initialization. Can you re-run the talon_initialize_database command and before running any reads through TALON, execute that python snippet again?

rkyger-git commented 1 year ago

I re-ran the talon_initialize_database command, and then ran the python snippet, the output I get this time is: [].

fairliereese commented 1 year ago

Yeah so that's definitely unexpected and likely what's causing the issue. Would you be able to try re-installing the latest commits from GitHub and rerunning the talon_initialize_database command and seeing if that changes the annotations that are in your db?

If that doesn't work you're welcome to send the GTF you're trying to initialize from to me (freese {at} uci.edu) and I will try to dig into what's going wrong in the source code.

fairliereese commented 1 year ago

Oh actually based on what I found out in this issue, I think it's possible that your GTF might be formatted in a way that's incompatible with TALON. Check out this wiki entry for more details.

rkyger-git commented 1 year ago

Thanks, I reformatted the GTF, and I was able to get talon_abundance to work. However, talon_filter_transcripts, produces an empty output file, and no error messages.

fairliereese commented 1 year ago

Did you rerun talon_initialize_database and talon itself after formatting your GTF? Just checking.

rkyger-git commented 1 year ago

Yes