Open Damtagor opened 5 years ago
Thank for using our tool! GFusion must use the index of bowtie 1 version. However, I found that you use the bowtie2Index in your cammand: NCBI/GRCh38/Sequence/Bowtie2Index/genome. I think you can replace it and try again. If you don't have the bowtie1Index, you should first use bowtie-build command to generate the correspond files according to the manual of bowtie. If you have any question, please do not hestitate to contact me immediately.
Thanks a lot for answering. I did it already. Sorry for this. Now I see that I posted an incomplete explanation. When I use bowtie 1 indexes, the script throws an error because Bowtie doesn't recognize the option --reorder which is exclusive of Bowtie2. I will post the full error report in a few hours.
Excuse me. I checked the comment and I found that the command posted was the wrong one. I will make it clear now.
I used this command:
perl GFusion.pl.txt -o output1 -r 0 -p 12 -i /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome -g /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf -1 test_1.fastq -2 test_2.fastq
Then, an error appeared because Bow tie didn't recognize the option --reorder:
[Tue Dec 4 18:57:08 2018]
[2018-12-04 18:57:08] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2018-12-04 18:57:08] Checking for Bowtie
Bowtie version: 1.1.2.0
[2018-12-04 18:57:10] Checking for Bowtie index files (genome)..
[2018-12-04 18:57:10] Checking for reference FASTA file
[2018-12-04 18:57:10] Generating SAM header for /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome
[2018-12-04 18:57:31] Preparing reads
left reads: min. length=50, max. length=50, 84131 kept reads (113 discarded)
right reads: min. length=50, max. length=50, 83725 kept reads (519 discarded)
[2018-12-04 18:57:33] Mapping left_kept_reads to genome genome with Bowtie
[FAILED]
Error running bowtie:
bowtie: unrecognized option '--reorder'
Usage:
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
<m1> Comma-separated list of files containing upstream mates (or the
sequences themselves, if -c is set) paired with mates in <m2>
<m2> Comma-separated list of files containing downstream mates (or the
sequences themselves if -c is set) paired with mates in <m1>
<r> Comma-separated list of files containing Crossbow-style reads. Can be
a mixture of paired and unpaired. Specify "-" for stdin.
<s> Comma-separated list of files containing unpaired reads, or the
sequences themselves, if -c is set. Specify "-" for stdin.
<hit> File to write hits to (default: stdout)
Input:
-q query input files are FASTQ .fq/.fastq (default)
-f query input files are (multi-)FASTA .fa/.mfa
-r query input files are raw one-sequence-per-line
-c query sequences given on cmd line (as <mates>, <singles>)
-C reads and index are in colorspace
-Q/--quals <file> QV file(s) corresponding to CSFASTA inputs; use with -f -C
--Q1/--Q2 <file> same as -Q, but for mate files 1 and 2 respectively
-s/--skip <int> skip the first <int> reads/pairs in the input
-u/--qupto <int> stop after first <int> reads/pairs (excl. skipped reads)
-5/--trim5 <int> trim <int> bases from 5' (left) end of reads
-3/--trim3 <int> trim <int> bases from 3' (right) end of reads
--phred33-quals input quals are Phred+33 (default)
--phred64-quals input quals are Phred+64 (same as --solexa1.3-quals)
--solexa-quals input quals are from GA Pipeline ver. < 1.3
--solexa1.3-quals input quals are from GA Pipeline ver. >= 1.3
--integer-quals qualities are given as space-separated integers (not ASCII)
--large-index force usage of a 'large' index, even if a small one is present
Alignment:
-v <int> report end-to-end hits w/ <=v mismatches; ignore qualities
or
-n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
-e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
-l/--seedlen <int> seed length for -n (default: 28)
--nomaqround disable Maq-like quality rounding for -n (nearest 10 <= 30)
-I/--minins <int> minimum insert size for paired-end alignment (default: 0)
-X/--maxins <int> maximum insert size for paired-end alignment (default: 250)
--fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (default: --fr)
--nofw/--norc do not align to forward/reverse-complement reference strand
--maxbts <int> max # backtracks for -n 2/3 (default: 125, 800 for --best)
--pairtries <int> max # attempts to find mate for anchor hit (default: 100)
-y/--tryhard try hard to find valid alignments, at the expense of speed
--chunkmbs <int> max megabytes of RAM for best-first search frames (def: 64)
Reporting:
-k <int> report up to <int> good alignments per read (default: 1)
-a/--all report all alignments per read (much slower than low -k)
-m <int> suppress all alignments if > <int> exist (def: no limit)
-M <int> like -m, but reports 1 random hit (MAPQ=0); requires --best
--best hits guaranteed best stratum; ties broken by quality
--strata hits in sub-optimal strata aren't reported (requires --best)
Output:
-t/--time print wall-clock time taken by search phases
-B/--offbase <int> leftmost ref offset = <int> in bowtie output (default: 0)
--quiet print nothing but the alignments
--refout write alignments to files refXXXXX.map, 1 map per reference
--refidx refer to ref. seqs by 0-based index rather than name
--al <fname> write aligned reads/pairs to file(s) <fname>
--un <fname> write unaligned reads/pairs to file(s) <fname>
--max <fname> write reads/pairs over -m limit to file(s) <fname>
--suppress <cols> suppresses given columns (comma-delim'ed) in default output
--fullref write entire ref name (default: only up to 1st space)
Colorspace:
--snpphred <int> Phred penalty for SNP when decoding colorspace (def: 30)
or
--snpfrac <dec> approx. fraction of SNP bases (e.g. 0.001); sets --snpphred
--col-cseq print aligned colorspace seqs as colors, not decoded bases
--col-cqual print original colorspace quals, not decoded quals
--col-keepends keep nucleotides at extreme ends of decoded alignment
SAM:
-S/--sam write hits in SAM format
--mapq <int> default mapping quality (MAPQ) to print for SAM alignments
--sam-nohead supppress header lines (starting with @) for SAM output
--sam-nosq supppress @SQ header lines for SAM output
--sam-RG <text> add <text> (usually "lab=value") to @RG line of SAM header
Performance:
-o/--offrate <int> override offrate of index; must be >= index's offrate
-p/--threads <int> number of alignment threads to launch (default: 1)
--mm use memory-mapped I/O for index; many 'bowtie's can share
--shmem use shared mem for index; many 'bowtie's can share
Other:
--seed <int> seed for random number generator
--verbose verbose output (for debugging)
--version print version information and quit
-h/--help print this usage message
Command: bowtie --wrapper basic-0 -v 2 -k 20 -m 20 -S -p 12 --reorder --sam-nohead --max /dev/null /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome -
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
open: No such file or directory
[main_samview] fail to open "output1/unmapped.bam" for reading.
[Tue Dec 4 18:57:33 2018]
Warning: Could not find any reads in "output1/un.fastq"
# reads processed: 0
# reads with at least one reported alignment: 0 (0.00%)
# reads that failed to align: 0 (0.00%)
No alignments
[samopen] SAM header is present: 195 sequences.
[sam_read1] reference 'ID:Bowtie VN:1.1.2 CL:"bowtie --wrapper basic-0 -p 12 /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome output1/un.fastq -S output1/fusion_out/un.sam"
r3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ !' is recognized as '*'.
[main_samview] truncated file.
[samopen] SAM header is present: 195 sequences.
[sam_read1] reference 'ID:Bowtie VN:1.1.2 CL:"bowtie --wrapper basic-0 -p 12 /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome output1/un.fastq -S output1/fusion_out/un.sam"
hr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ!' is recognized as '*'.
[main_samview] truncated file.
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!
[bam_header_read] EOF marker is absent. The input is probably truncated.
[Tue Dec 4 18:57:37 2018]
Result: No Fusion Genes! The time elapsed: about 0 hours.
After this, I used Bowtie2 indexes:
perl GFusion.pl.txt -o output1 -r 0 -p 12 -i /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome -g /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf -1 test_1.fastq -2 test_2.fastq
But the script doesn't use that type of indexes:
[Tue Dec 4 19:01:33 2018]
[2018-12-04 19:01:33] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2018-12-04 19:01:33] Checking for Bowtie
Bowtie version: 1.1.2.0
[2018-12-04 19:01:33] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie index files (/mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome.*.ebwt)
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
open: No such file or directory
[main_samview] fail to open "output1/unmapped.bam" for reading.
[Tue Dec 4 19:01:33 2018]
Could not locate a Bowtie index corresponding to basename "/mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome"
Command: bowtie --wrapper basic-0 -p 12 -S /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome output1/un.fastq output1/fusion_out/un.sam
[samopen] SAM header is present: 195 sequences.
[sam_read1] reference 'ID:Bowtie VN:1.1.2 CL:"bowtie --wrapper basic-0 -p 12 /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome output1/un.fastq -S output1/fusion_out/un.sam"
r3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ !' is recognized as '*'.
[main_samview] truncated file.
[samopen] SAM header is present: 195 sequences.
[sam_read1] reference 'ID:Bowtie VN:1.1.2 CL:"bowtie --wrapper basic-0 -p 12 /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome output1/un.fastq -S output1/fusion_out/un.sam"
hr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ!' is recognized as '*'.
[main_samview] truncated file.
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!
[bam_header_read] EOF marker is absent. The input is probably truncated.
[Tue Dec 4 19:01:35 2018]
Result: No Fusion Genes! The time elapsed: about 0 hours.
I don't know how to solve this situation exactly. Thanks once again and sorry for the inconvenience.
I'm afraid it's beyond my ability, because the GFusion ran fine when I tested it with tophat (v2.1.0) and bowtie (v1.1.2).
[Fri Dec 7 15:00:41 2018]
[2018-12-07 15:00:45] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2018-12-07 15:00:45] Checking for Bowtie
Bowtie version: 1.1.2.0
[2018-12-07 15:00:48] Checking for Bowtie index files (genome)..
[2018-12-07 15:00:48] Checking for reference FASTA file
[2018-12-07 15:00:48] Generating SAM header for bowtie1_hg19/hg19
[2018-12-07 15:08:02] Preparing reads
left reads: min. length=50, max. length=50, 84131 kept reads (113 discarded)
right reads: min. length=50, max. length=50, 83725 kept reads (519 discarded)
[2018-12-07 15:08:09] Mapping left_kept_reads to genome hg19 with Bowtie
[2018-12-07 15:08:21] Mapping left_kept_reads_seg1 to genome hg19 with Bowtie (1/2)
[2018-12-07 15:08:28] Mapping left_kept_reads_seg2 to genome hg19 with Bowtie (2/2)
[2018-12-07 15:08:34] Mapping right_kept_reads to genome hg19 with Bowtie
[2018-12-07 15:08:47] Mapping right_kept_reads_seg1 to genome hg19 with Bowtie (1/2)
[2018-12-07 15:08:52] Mapping right_kept_reads_seg2 to genome hg19 with Bowtie (2/2)
[2018-12-07 15:08:58] Searching for junctions via segment mapping
[2018-12-07 15:22:18] Retrieving sequences for splices
[2018-12-07 15:26:28] Indexing splices
[2018-12-07 15:26:37] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie (1/2)
[2018-12-07 15:26:39] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie (2/2)
[2018-12-07 15:26:41] Joining segment hits
[2018-12-07 15:29:36] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie (1/2)
[2018-12-07 15:29:39] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie (2/2)
[2018-12-07 15:29:41] Joining segment hits
[2018-12-07 15:32:12] Reporting output tracks
-----------------------------------------------
[2018-12-07 15:35:25] A summary of the alignment counts can be found in outfile/align_summary.txt
[2018-12-07 15:35:25] Run complete: 00:34:40 elapsed
[Fri Dec 7 15:36:14 2018]
# reads processed: 33598
# reads with at least one reported alignment: 31889 (94.91%)
# reads that failed to align: 1709 (5.09%)
Reported 31889 alignments to 1 output stream(s)
[2018-12-07 15:42:18] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2018-12-07 15:42:18] Checking for Bowtie
Bowtie version: 1.1.2.0
[2018-12-07 15:42:19] Checking for Bowtie index files (genome)..
[2018-12-07 15:42:19] Checking for reference FASTA file
[2018-12-07 15:42:19] Generating SAM header for outfile/fusion_out/ref/index/re
[2018-12-07 15:42:20] Preparing reads
left reads: min. length=50, max. length=50, 148 kept reads (0 discarded)
right reads: min. length=50, max. length=50, 148 kept reads (0 discarded)
[2018-12-07 15:42:21] Mapping left_kept_reads to genome re with Bowtie
[2018-12-07 15:42:22] Mapping right_kept_reads to genome re with Bowtie
Warning: junction database is empty!
[2018-12-07 15:42:24] Reporting output tracks
-----------------------------------------------
[2018-12-07 15:42:25] A summary of the alignment counts can be found in outfile/fusion_out/final/align_summary.txt
[2018-12-07 15:42:25] Run complete: 00:00:07 elapsed
[Fri Dec 7 15:42:26 2018] Completed successfully! The time elapsed: about 0.69 hours.
The option '--reorder' was not written in the code of GFusion, and I found that this error occurred when running the command:
tophat -o out_file --bowtie1 -p 12 -r 0 -I100000 --no-coverage-search /path/to/bowtie1_index PE_reads_1.fastq -2 PE_reads_2.fastq
You can run the above command, and if you got the same 'Error information', then this error is due to that. And I searched 'bowtie --reorder' and 'bowtie2 --reorder' by google, the option '--reorder' belongs to bowtie2 not bowtie1. So, I thought your tophat used bowtie1 as bowtie2. Sorry for my limited ability, I think you can submit this question to Tophat.
The command that I am using:
In half of the execution, it throws an error because bowtie executed unrecognized option '--reorder' (an option of bowtie2). When I try to use bowtie2 indexes, tophat doesn't recognize them because it is searching bowtie indexes only. How have you solved this?