tkzeng / Pangolin

Pangolin is a deep-learning method for predicting splice site strengths.
GNU General Public License v3.0
61 stars 32 forks source link

WARNING, skipping variant: Variant not contained in a gene body. Do GTF/FASTA chromosome names match? #25

Open GEsti124 opened 2 months ago

GEsti124 commented 2 months ago

Dear Authors,

Thank you for the great tool.

I am getting this kind of error when i try to run Pangolin on csv example input (brca.csv):

Using CPU
[Line 1] WARNING, skipping variant: Variant not contained in a gene body. Do GTF/FASTA chromosome names match?
[Line 2] WARNING, skipping variant: Variant not contained in a gene body. Do GTF/FASTA chromosome names match?
[Line 3] WARNING, skipping variant: Variant not contained in a gene body. Do GTF/FASTA chromosome names match?
[Line 4] WARNING, skipping variant: Variant not contained in a gene body. Do GTF/FASTA chromosome names match?
[Line 5] WARNING, skipping variant: Variant not contained in a gene body. Do GTF/FASTA chromosome names match?
[Line 6] WARNING, skipping variant: Variant not contained in a gene body. Do GTF/FASTA chromosome names match?
...

The input files are hg37 based, db and reference used were the same as the ones suggested. The code that i used is the following:

wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz
wget https://www.dropbox.com/sh/6zo0aegoalvgd9f/AAA9Q90Pi1UqSzX99R_NM803a/gencode.v38lift37.annotation.db

pangolin examples/brca.csv  \
    Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz \
    gencode.v38lift37.annotation.db \
    output.pangolin

The output file looks like this: BRCA1,17,41276135,T,G, BRCA1,17,41276135,T,C, BRCA1,17,41276135,T,A, BRCA1,17,41276134,T,G, BRCA1,17,41276134,T,C, BRCA1,17,41276134,T,A, BRCA1,17,41276133,C,T, BRCA1,17,41276133,C,G,

Could you help me with this please?