lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

backlocate does not work #170

Closed pawarad closed 3 years ago

pawarad commented 3 years ago

Verify

I have followed the exact installation instructions specified at http://lindenb.github.io/jvarkit/BackLocate.html. My java version is 11 (openjdk 11 2018-09-25, loaded in a guix environment).

Subject of the issue

Backlocate works for example gene (NOTCH2) mentioned in the description (http://lindenb.github.io/jvarkit/BackLocate.html). Used Homo_sapiens.GRCh37.87.gtf file from https://github.com/lindenb/jvarkit/tree/master/src/test/resources. Used hg19.fa file from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/.

But when I am trying to get the output for another gene, backlocate doesn't work. As the gtf file mentioned above is small, I tried to use a bigger one from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/ using both the 'hg19.ensGene.gtf.gz' and 'hg19.ncbiRefSeq.gtf.gz' still backlocate shows no output.

Steps to reproduce

-- Example which works (I have Indexed fasta Reference file using samtools faidx and with picard CreateSequenceDictionary along with fasta file in a folder 'reference')

echo -e "NOTCH2\tP1090M" | java -jar dist/backlocate.jar -R reference/hg19.fa --gtf Homo_sapiens.GRCh37.87.gtf

Screen Shot 2020-12-27 at 12 35 24 PM

-- Output of other gene which does not work (Used hg19.ensGene.gtf.gz as gtf file from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/) echo -e "BRCA2\tL1026X" | java -jar dist/backlocate.jar -R reference/hg19.fa --gtf hg19.ensGene.gtf

Screen Shot 2021-01-18 at 2 53 15 PM

-- Output of other gene which does not work (Used hg19.ncbiRefSeq.gtf.gz as gtf file from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/) echo -e "BRCA2\tL1026X" | java -jar dist/backlocate.jar -R reference/hg19.fa --gtf hg19.ncbiRefSeq.gtf

Screen Shot 2021-01-18 at 2 57 30 PM

I have tried other examples also but apart from the example gene, backlocate does not work for me.

lindenb commented 3 years ago

can you please use ftp://ftp.ensembl.org/pub/grch37/current/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.gtf.gz

lindenb commented 3 years ago

ucsc doesn't contain the name of the genes:

$ wget -q -O - "http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.knownGene.gtf.gz" | gunzip -c | grep BRCA2
$
pawarad commented 3 years ago

Thank you @lindenb . This gtf file works. Much appreciated your quick response.