Closed giuseppedelnapalle closed 5 years ago
Hi, I haven't used this software since I wrote it for biostars. I'm going to have a look....
I think your problem is that your bam uses a chromosome notation '1','2',... that is not the same as your gtf file 'chr1','chr2',... I added a method to automatically convert the chromosomes' names.
I think your problem is that your bam uses a chromosome notation '1','2',... that is not the same as your gtf file 'chr1','chr2',... I added a method to automatically convert the chromosomes' names.
The annotation of chromosome of the gtf file is not the cause. I used the same gtf file from Ensembl for building STAR index and biostar103303.jar, so inconsistency of annotation should not be the case. I checked the format of sam (created with samtools from bam), and it proved the speculation. The chromosome of the sam is annotated as 1, 2, 6 etc. Here is an example of the sam I checked.
XXX9743853.50878191 419 1 11594 1 100M = 11643 148 CTGTATCCCACCAGCAATGTCTAGGAATACCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATBBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIIFIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFBFBFFFBFFF NH:i:3 HI:i:2 AS:i:195 nM:i:1
but on my side, I'm finding some exons with a random bam and the specified gtf ? (using http instead of ftp )
java -jar dist/biostar103303.jar -g 'http://ftp.ensembl.org/pub/release-93/gtf/homo_sapiens/Homo_sapiens.Gh38.93.gtf.gz' src/test/resources/HG02260.transloc.chr9.14.bam 2>&1 | grep -v "Cannot find contig" | head
[INFO][Biostar103303]Reading http://ftp.ensembl.org/pub/release-93/gtf/homo_sapiens/Homo_sapiens.GRCh38.93.gtf.gz
[INFO][Biostar103303]End Reading http://ftp.ensembl.org/pub/release-93/gtf/homo_sapiens/Homo_sapiens.GRCh38.93.gtf.gz N=1237092
[INFO][Biostar103303]. Completed. N=477. That took:0 second
#chrom exon.start exon.end exon.exon_id exon.index5_3 transcript_id gene_name gene_id exon.count_prev_and_next exon.count_prev_and_curr exon.count_curr_and_next exon.count_curr_only exon.count_others
22 10736171 10736283 ENSE00003736336 1/1 ENST00000615943 RF00004 ENSG00000277248 0 0 0 0 0
22 10939388 10939423 ENSE00003790077 1/9 ENST00000635667 FRG1FP ENSG00000283047 0 0 0 0 0
22 10940597 10940707 ENSE00003791492 2/9 ENST00000635667 FRG1FP ENSG00000283047 0 0 0 0 0
22 10941691 10941780 ENSE00003787209 3/9 ENST00000635667 FRG1FP ENSG00000283047 0 0 0 0 0
22 10944967 10945053 ENSE00003785466 4/9 ENST00000635667 FRG1FP ENSG00000283047 0 0 0 0 0
There seems to be something wrong with my biostar103303.jar software. I had an error when testing the example you gave (last command in the document of biostar103303). The full messages were shown below.
[giuseppe@localhost dist]$ curl -s "http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqA549CellLongnonpolyaAlnRep1.bam" | java -jar biostar103303.jar -g "http://atgu.mgh.harvard.edu/plinkseq/dist/aux/gencodeBasicV11-hg19.gtf.gz" > result.tsv [INFO][Biostar103303]Reading sfomr stdin [INFO][Biostar103303]Reading http://atgu.mgh.harvard.edu/plinkseq/dist/aux/gencodeBasicV11-hg19.gtf.gz [SEVERE][Biostar103303]http://atgu.mgh.harvard.edu/plinkseq/dist/aux/gencodeBasicV11-hg19.gtf.gz java.io.FileNotFoundException: http://atgu.mgh.harvard.edu/plinkseq/dist/aux/gencodeBasicV11-hg19.gtf.gz at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1890) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) at java.net.URL.openStream(URL.java:1045) at com.github.lindenb.jvarkit.io.IOUtils.openURIForReading(IOUtils.java:272) at com.github.lindenb.jvarkit.io.IOUtils.openURIForLineReader(IOUtils.java:400) at com.github.lindenb.jvarkit.io.IOUtils.openURIForLineIterator(IOUtils.java:405) at com.github.lindenb.jvarkit.tools.biostar.Biostar103303.readGTF(Biostar103303.java:169) at com.github.lindenb.jvarkit.tools.biostar.Biostar103303.doWork(Biostar103303.java:328) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1208) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1366) at com.github.lindenb.jvarkit.tools.biostar.Biostar103303.main(Biostar103303.java:514) [INFO][Launcher]biostar103303 Exited with failure (-1)
Environment: version of jvarkit: jvarkit 2018.04.05 version of java: java version "1.8.0_192" the value of ${JAVA_HOME}: /usr/java/jdk1.8.0_192-amd64 OS: CentOS 7
Would you please offer some advice?
Cheers.
the url for your GTF is wrong.
$ wget -O - "http://atgu.mgh.harvard.edu/plinkseq/dist/aux/gencodeBasicV11-hg19.gtf.gz" > /dev/null
--2018-11-29 14:11:15-- http://atgu.mgh.harvard.edu/plinkseq/dist/aux/gencodeBasicV11-hg19.gtf.gz
(...)
Proxy request sent, awaiting response... 404 Not Found
2018-11-29 14:11:16 ERROR 404: Not Found.
OK, the url in the example is not valid. I think the problem was caused by something relevant to java. When I ran the command
java -jar biostar103303.jar path/to/sample.bam -g path/to/Homo_sapiens.GRCh38.93.gtf > result.tsv
The following message indicating illegal number of arguments was returned:
[SEVERE][Biostar103303]Illegal number of arguments. [INFO][Launcher]biostar103303 Exited with failure (-1)
Any suggestion on this issue?
this was fixed this morning when I've updated the code, please pull the new code.
Otherwise try
cat path/to/sample.bam | java -jar biostar103303.jar -g path/to/Homo_sapiens.GRCh38.93.gtf > result.tsv
Thank you, the program seems to be running well. I'll see if it can output reasonable results.
Hi,
I had a problem when running the biostar103303 program. The process failed giving the message "no exon found".
The command was:
More error messages were shown below.
The bam file was created by STAR, and the parameters for STAR were:
Additionally, the gtf file was downloaded here (ftp://ftp.ensembl.org/pub/release-93/gtf/homo_sapiens/Homo_sapiens.GRCh38.93.gtf.gz).
Your environment
${JAVA_HOME}
: /usr/java/jdk1.8.0_192-amd64Do you have any idea what went wrong?
Thank you.