lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
478 stars 132 forks source link

Backlocate transcript not found #162

Closed esamorodnitsky closed 4 years ago

esamorodnitsky commented 4 years ago

Verify

Subject of the issue

Backlocate can't seem to find any of the gene names in the GTF I give it.

Your environment

Steps to reproduce

echo -e 'BRAF\tV600E\n' | java -jar backlocate.jar --gtf hg19.refGene.gtf -R hg19_chr.fasta

Expected behaviour

I expect a table with the possible list of mutations for BRAF V600E.

Actual behaviour

User.Gene AA1 petide.pos.1 AA2 transcript.name transcript.id transcript.strand transcript.AA index0.in.rna wild.codon potential.var.codons base.in.rna chromosome index0.in.genomic exon messages extra.user.data

[WARN][BackLocate]no transcript found for BRAF

I also performed "grep" for BRAF in the GTF. It is in there.

lindenb commented 4 years ago

Hi,

I also performed "grep" for BRAF in the GTF. It is in there.

can you please show me the lines for:

grep BRAF hg19.refGene.gtf

esamorodnitsky commented 4 years ago

BRAF.txt

lindenb commented 4 years ago

There is no line with the 3rd column is "gene"

awk '$3=="gene"' BRAF.txt

you'd better use a GTF from ENSEMBL:

$ wget -O - -q "ftp://ftp.ensembl.org/pub/grch37/current/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.chr.gtf.gz" | gunzip -c | grep -w BRAF | awk '$3=="gene"'
7   ensembl_havana  gene    140419127   140624564   .   -   .gene_id "ENSG00000157764"; gene_version "8"; gene_name "BRAF"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
esamorodnitsky commented 4 years ago

Excellent, downloading the new GTF file worked. The GTF that I was using I got from UCSC. Another thing, is Backlocate 0-based? I ran it on BRAF V600E and got there numbers:

BRAF Val 600 Glu BRAF ENST00000288602 - V 1797 GTG GAG G chr7 140453136 ENST00000288602.Exon15 . . BRAF Val 600 Glu BRAF ENST00000288602 - V 1798 GTG GAG T chr7 140453135 ENST00000288602.Exon15 . . BRAF Val 600 Glu BRAF ENST00000288602 - V 1799 GTG GAG G chr7 140453134 ENST00000288602.Exon15 . .

But, on UCSC, the positions are between chr7:140,453,135-140,453,137.

lindenb commented 4 years ago

Another thing, is Backlocate 0-based?

Yes

esamorodnitsky commented 4 years ago

Ok, great! Thanks a lot!