Open Kiliankleemann opened 11 months ago
I made sure the reformatting of GTF is correct:
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz
gzip -d *.gz
talon_reformat_gtf -g reference/GRCh38_GENCODE_rmsk_TE.gtf
talon_initialize_database --f reference/GRCh38_GENCODE_rmsk_TE_reformatted.gtf \
--g hg38_rmsk_ucsd \
--a hg38 \
--o hg38
Would you be able to share the GTF that you're using with me? I will try running it on my end and see if I can pinpoint the issue.
Should be able to download the gtf and unzp with the first command - thats the one I tried
This one? https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz This does not look like a GTF to me. For example, the strand should be in the 6th column (0-indexed), but looks like it's in the 9th column of your file.
Which gtf did you use for hg38 repeatmasker?
For me to best help you, you should send all the commands that you used to download / format your GTF. I think I'm missing some information from your side.
I'm having a similar issue and I'm not really sure why. I've also tried using the gtf formatter with no luck.
It took 0:00:00.01 to process chromosome
NW_023397527.1
Traceback (most recent call last):
File "/users/aademilu/.local/bin/talon_initialize_database", line 8, in
I've attached an example of the file. The full file can be found here. gtf_example.txt
Can you please send me the exact command you tried for talon_initialize_database
, as well as the version number of TALON that you're using?
Can you please send me the exact command you tried for
talon_initialize_database
, as well as the version number of TALON that you're using?
talon_initialize_database --f ../../reference/GCF_004126475.2_mPhyDis1.pri.v3_genomic.gtf --a discolor_annot --g discolor --o discolor
Where can I find version information?
I don't think there's a nice way to access the version info now, but if you haven't updated TALON in a long time it might be worth pulling and installing the latest commits. On my machine, I am able to run your init command with gtf_example.txt
no problem. Did you also verify that you're having an issue with the small file too?
Yes, while that one does run for me as well (it doesn't inlcude NW_023397527.1 ), I cannot get other cuts of the file to work, it creates an error as follows:
genes, transcripts, exons = read_gtf_file(gtf_file)
File "/users/aademilu/.local/lib/python3.8/site-packages/talon/initialize_talon_database.py", line 495, in read_gtf_file
entry_type = tab_fields[2]
I noticed that the gene is the only one of that scaffold, maybe that could be the issue? I have provided the full file, which will run until the scaffold in question. The program will run if I remove the gene from the gtf.
The problem is that gene does not have any transcripts annotated to it. If you look, it goes from one gene entry (the one on your NW_023397527.1 chromosome) to the next, without any additional entry. I would advise removing this entry from your GTF and moving on with your analysis.
You're right. Strange. I've removed it an it works just fine. Was able to fully process everything. Thanks.
Tried to run talon_initialize_database but got an error: