Closed tobsecret closed 6 years ago
Update on this - I did not know this and it's not mentioned in the docs but before doing any computation, MIPgen calls bwa and samtools without arguments so seeing the usage message of those is intentional.
The versions that ended up working for me with MIPgen are listed here:
More detail below: My trouble was that I was getting only nan values for the logistic score column, as mentioned in #4 but none of the fixes seemed to have any impact. So I tried different versions of BWA and it turns out using an old version (0.5.9) was what caused the errors (marked in bold) in the log of my MIPgen run:
```shell [user@c35-09 testrun]$ mipgen -project_name testrun -bwa_genome_index ../Genome.fasta -regions_to_scan ../genes.bed -min_capture_size 162 -max_capture_size 162 Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.5.9-r16 Contact: Heng LiUsage: bwa [options] Command: index index sequences in the FASTA format aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format pac_rev generate reverse PAC bwt2sa generate SA from BWT and Occ pac2cspac convert PAC to color-space PAC stdsw standard SW/NW alignment [mipgen] success on opening region file [mipgen] features loaded; retrieving chromosomal sequence Program: samtools (Tools for alignments in the SAM format) Version: 1.6 (using htslib 1.6) Usage: samtools [options] Commands: -- Indexing dict create a sequence dictionary file faidx index/extract FASTA index index alignment -- Editing calmd recalculate MD/NM tags and '=' bases fixmate fix mate information reheader replace BAM header rmdup remove PCR duplicates targetcut cut fosmid regions (for fosmid pool only) addreplacerg adds or replaces RG tags markdup mark duplicates -- File operations collate shuffle and group alignments by name cat concatenate BAMs merge merge sorted alignments mpileup multi-way pileup sort sort alignment file split splits a file by read group quickcheck quickly check if SAM/BAM/CRAM file appears intact fastq converts a BAM to a FASTQ fasta converts a BAM to a FASTA -- Statistics bedcov read depth per BED region depth compute the depth flagstat simple stats idxstats BAM index stats phase phase heterozygotes stats generate stats (former bamcheck) -- Viewing flags explain BAM flags tview text alignment viewer view SAM<->BAM<->CRAM conversion depad convert padded BAM to unpadded BAM [mipgen] first line of reference fasta reads: >Org_Strn_chr01 | organism=MyOrganism | version=2015-06-18 | length=830522 | SO=chromosome [mipgen] no masked feature file found [mipgen] regions ready; accessing snp file [mipgen] all 0 snps loaded; generating files for bwa [bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9 [bwt_restore_bwt] fail to open file '../Genome.fasta.rbwt'. Abort! sh: line 1: 39781 Aborted (core dumped) bwa aln -t 1 ../Genome.fasta testrun.all_sequences.fq > testrun.all_sequences.sai [bam_header_read] invalid BAM binary header (this is not a BAM file). [mipgen] sam file opened [bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9 [bwt_restore_bwt] fail to open file '../Genome.fasta.rbwt'. Abort! sh: line 1: 39786 Aborted (core dumped) bwa aln -t 1 ../Genome.fasta testrun.oligo_copy_count.fq > testrun.oligo_copy_count.sai [bam_header_read] invalid BAM binary header (this is not a BAM file). [mipgen] checking oligo copy [mipgen] bwa copy number analysis finished [mipgen] file of all mips ready for write: testrun.all_mips.txt [mipgen] file of collapsed mips ready for write: testrun.collapsed_mips.txt [mipgen] feature #1 [mipgen] feature #2 [mipgen] feature #3 [mipgen] feature #4 [mipgen] feature #5 [mipgen] feature #6 [mipgen] mip picking complete: testrun.picked_mips.txt and testrun.snp_mips.txt ```
Below the output of the same run with a newer version of bwa (0.7.15)
[schrat01@c35-09 testrun]$ mipgen -project_name testrun -bwa_genome_index ../Genome.fasta -regions_to_scan ../pf_resistance_genes.bed -min_capture_size 162 -max_capture_size 162
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.15-r1140
Contact: Heng Li <lh3@sanger.ac.uk>
Usage: bwa <command> [options]
Command: index index sequences in the FASTA format
mem BWA-MEM algorithm
fastmap identify super-maximal exact matches
pemerge merge overlapping paired ends (EXPERIMENTAL)
aln gapped/ungapped alignment
samse generate alignment (single ended)
sampe generate alignment (paired ended)
bwasw BWA-SW for long queries
shm manage indices in shared memory
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
bwt2sa generate SA from BWT and Occ
Note: To use BWA, you need to first index the genome with `bwa index'.
There are three alignment algorithms in BWA: `mem', `bwasw', and
`aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
first. Please `man ./bwa.1' for the manual.
[mipgen] success on opening region file
[mipgen] features loaded; retrieving chromosomal sequence
Program: samtools (Tools for alignments in the SAM format)
Version: 1.6 (using htslib 1.6)
Usage: samtools <command> [options]
Commands:
-- Indexing
dict create a sequence dictionary file
faidx index/extract FASTA
index index alignment
-- Editing
calmd recalculate MD/NM tags and '=' bases
fixmate fix mate information
reheader replace BAM header
rmdup remove PCR duplicates
targetcut cut fosmid regions (for fosmid pool only)
addreplacerg adds or replaces RG tags
markdup mark duplicates
-- File operations
collate shuffle and group alignments by name
cat concatenate BAMs
merge merge sorted alignments
mpileup multi-way pileup
sort sort alignment file
split splits a file by read group
quickcheck quickly check if SAM/BAM/CRAM file appears intact
fastq converts a BAM to a FASTQ
fasta converts a BAM to a FASTA
-- Statistics
bedcov read depth per BED region
depth compute the depth
flagstat simple stats
idxstats BAM index stats
phase phase heterozygotes
stats generate stats (former bamcheck)
-- Viewing
flags explain BAM flags
tview text alignment viewer
view SAM<->BAM<->CRAM conversion
depad convert padded BAM to unpadded BAM
[mipgen] first line of reference fasta reads: >Org_Strn_chr01 | organism=MyOrganism | version=2015-06-18 | length=830522 | SO=chromosome
[mipgen] no masked feature file found
[mipgen] regions ready; accessing snp file
[mipgen] all 0 snps loaded; generating files for bwa
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 0.92 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 21712 sequences have been processed.
[main] Version: 0.7.15-r1140
[main] CMD: bwa aln -t 1 ../Genome.fasta testrun.all_sequences.fq
[main] Real time: 1.030 sec; CPU: 0.977 sec
[bwa_aln_core] convert to sequence coordinate... 0.08 sec
[bwa_aln_core] refine gapped alignments... 0.01 sec
[bwa_aln_core] print alignments... 0.04 sec
[bwa_aln_core] 21712 sequences have been processed.
[main] Version: 0.7.15-r1140
[main] CMD: bwa samse ../Genome.fasta testrun.all_sequences.sai testrun.all_sequences.fq
[main] Real time: 0.196 sec; CPU: 0.168 sec
[mipgen] sam file opened
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 6.04 sec
[bwa_aln_core] write to the disk... 0.02 sec
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 1.27 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 316946 sequences have been processed.
[main] Version: 0.7.15-r1140
[main] CMD: bwa aln -t 1 ../Genome.fasta testrun.oligo_copy_count.fq
[main] Real time: 7.731 sec; CPU: 7.567 sec
[bwa_aln_core] convert to sequence coordinate... 0.52 sec
[bwa_aln_core] refine gapped alignments... 0.08 sec
[bwa_aln_core] print alignments... 0.40 sec
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] convert to sequence coordinate... 0.11 sec
[bwa_aln_core] refine gapped alignments... 0.02 sec
[bwa_aln_core] print alignments... 0.09 sec
[bwa_aln_core] 316946 sequences have been processed.
[main] Version: 0.7.15-r1140
[main] CMD: bwa samse ../Genome.fasta testrun.oligo_copy_count.sai testrun.oligo_copy_count.fq
[main] Real time: 1.585 sec; CPU: 1.473 sec
[mipgen] checking oligo copy
[mipgen] bwa copy number analysis finished
[mipgen] file of all mips ready for write: testrun.all_mips.txt
[mipgen] file of collapsed mips ready for write: testrun.collapsed_mips.txt
[mipgen] feature #1
[mipgen] feature #2
[mipgen] feature #3
[mipgen] feature #4
[mipgen] feature #5
[mipgen] feature #6
[mipgen] mip picking complete:
testrun.picked_mips.txt
and
testrun.snp_mips.txt
Thanks for the detective work!
Evan
On May 30, 2018 at 9:41:37 AM, tobsecret (notifications@github.com) wrote:
Update on this - I did not know this and it's not mentioned in the docs but before doing any computation, MIPgen calls bwa and samtools without arguments so seeing the usage message of those is intentional.
The versions that ended up working for me with MIPgen are listed here:
More detail below: My trouble was that I was getting only nan values for the logistic score column, as mentioned in #4 https://github.com/shendurelab/MIPGEN/issues/4 but none of the fixes seemed to have any impact. So I tried different versions of BWA and it turns out using an old version (0.5.9) was what caused the errors (marked in bold) in the log of my MIPgen run:
[user@c35-09 testrun]$ mipgen -project_name testrun -bwa_genome_index ../Genome.fasta -regions_to_scan ../genes.bed -min_capture_size 162 -max_capture_size 162
Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.5.9-r16 Contact: Heng Li lh3@sanger.ac.uk
Usage: bwa
Command: index index sequences in the FASTA format aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
pac_rev generate reverse PAC
bwt2sa generate SA from BWT and Occ
pac2cspac convert PAC to color-space PAC
stdsw standard SW/NW alignment
[mipgen] success on opening region file [mipgen] features loaded; retrieving chromosomal sequence
Program: samtools (Tools for alignments in the SAM format) Version: 1.6 (using htslib 1.6)
Usage: samtools
Commands: -- Indexing dict create a sequence dictionary file faidx index/extract FASTA index index alignment
-- Editing calmd recalculate MD/NM tags and '=' bases fixmate fix mate information reheader replace BAM header rmdup remove PCR duplicates targetcut cut fosmid regions (for fosmid pool only) addreplacerg adds or replaces RG tags markdup mark duplicates
-- File operations collate shuffle and group alignments by name cat concatenate BAMs merge merge sorted alignments mpileup multi-way pileup sort sort alignment file split splits a file by read group quickcheck quickly check if SAM/BAM/CRAM file appears intact fastq converts a BAM to a FASTQ fasta converts a BAM to a FASTA
-- Statistics bedcov read depth per BED region depth compute the depth flagstat simple stats idxstats BAM index stats phase phase heterozygotes stats generate stats (former bamcheck)
-- Viewing flags explain BAM flags tview text alignment viewer view SAM<->BAM<->CRAM conversion depad convert padded BAM to unpadded BAM
[mipgen] first line of reference fasta reads: >Org_Strn_chr01 | organism=MyOrganism | version=2015-06-18 | length=830522 | SO=chromosome [mipgen] no masked feature file found [mipgen] regions ready; accessing snp file [mipgen] all 0 snps loaded; generating files for bwa [bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9[bwt_restore_bwt] fail to open file '../Genome.fasta.rbwt'. Abort! sh: line 1: 39781 Aborted (core dumped) bwa aln -t 1 ../Genome.fasta testrun.all_sequences.fq > testrun.all_sequences.sai [bam_header_read] invalid BAM binary header (this is not a BAM file). [mipgen] sam file opened [bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9[bwt_restore_bwt] fail to open file '../Genome.fasta.rbwt'. Abort! sh: line 1: 39786 Aborted (core dumped) bwa aln -t 1 ../Genome.fasta testrun.oligo_copy_count.fq > testrun.oligo_copy_count.sai [bam_header_read] invalid BAM binary header (this is not a BAM file). [mipgen] checking oligo copy [mipgen] bwa copy number analysis finished [mipgen] file of all mips ready for write: testrun.all_mips.txt [mipgen] file of collapsed mips ready for write: testrun.collapsed_mips.txt [mipgen] feature #1 [mipgen] feature #2 [mipgen] feature #3 [mipgen] feature #4 [mipgen] feature #5 [mipgen] feature #6 [mipgen] mip picking complete: testrun.picked_mips.txt and testrun.snp_mips.txt
Below the output of the same run with a newer version of bwa (0.7.15)
[schrat01@c35-09 testrun]$ mipgen -project_name testrun -bwa_genome_index ../Genome.fasta -regions_to_scan ../pf_resistance_genes.bed -min_capture_size 162 -max_capture_size 162
Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.15-r1140 Contact: Heng Li lh3@sanger.ac.uk
Usage: bwa
Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries
shm manage indices in shared memory
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
bwt2sa generate SA from BWT and Occ
Note: To use BWA, you need to first index the genome with bwa index'. There are three alignment algorithms in BWA:
mem', bwasw', and
aln/samse/sampe'. If you are not sure which to use, try bwa mem' first. Please
man ./bwa.1' for the manual.
[mipgen] success on opening region file[mipgen] features loaded;
retrieving chromosomal sequence
Program: samtools (Tools for alignments in the SAM format)Version: 1.6
(using htslib 1.6)
Usage: samtools
feature #6[mipgen] mip picking complete:testrun.picked_mips.txt and testrun.snp_mips.txt
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/shendurelab/MIPGEN/issues/33#issuecomment-393230893, or mute the thread https://github.com/notifications/unsubscribe-auth/AF01p6sFDrDeNPxpSlGcsSs3ejAjS8rmks5t3svBgaJpZM4UMaRT .
You got it - Thanks for writing & maintaining the software!
Hello, I had the same BWA issue. Changing the "-project-name" parameter resolved this problem, it seems like MIPgen did not want to generate a new folder. Providing a folder already existing fixed it.
What version of BWA does MIPgen depend on? My script is below:
When I run that, I get the following output:
Which suggests to me that either I am not feeding in the right files or the syntax has changed between bwa versions.