shendurelab / MIPGEN

One stop MIP design and analysis
Other
22 stars 19 forks source link

No logistic scores, only -nan #42

Closed biokatrin closed 1 year ago

biokatrin commented 4 years ago

Hi, I have run the mipgen_practice with the example commands but I don't get any logistic scores, only -nan values. I have read the previous posts regarding log scores /nan values, but the error messages do not apply to me. I also have problems loading the snp vcf file. There is no error message, but no snps are loaded, and not snp mips produced. Can someone help, please?

This is my output:

~$ MIPGEN/mipgen -regions_to_scan mipgen_practice/practice_genes.bed -project_name practice_design -min_capture_size 162 -max_capture_size 162 -snp_file ./practice_design.local_snp_data.vcf -bwa_genome_index ./human_g1k_v37.fasta

Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.17-r1188 Contact: Heng Li lh3@sanger.ac.uk

Usage: bwa [options]

Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries

     shm           manage indices in shared memory
     fa2pac        convert FASTA to PAC format
     pac2bwt       generate BWT from PAC
     pac2bwtgen    alternative algorithm for generating BWT
     bwtupdate     update .bwt to the new format
     bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with bwa index'. There are three alignment algorithms in BWA:mem', bwasw', and aln/samse/sampe'. If you are not sure which to use, try bwa mem' first. Pleaseman ./bwa.1' for the manual.

[mipgen] success on opening region file [mipgen] features loaded; retrieving chromosomal sequence

Program: samtools (Tools for alignments in the SAM format) Version: 1.10 (using htslib 1.10.2-3)

Usage: samtools [options]

Commands: -- Indexing dict create a sequence dictionary file faidx index/extract FASTA fqidx index/extract FASTQ index index alignment

-- Editing calmd recalculate MD/NM tags and '=' bases fixmate fix mate information reheader replace BAM header targetcut cut fosmid regions (for fosmid pool only) addreplacerg adds or replaces RG tags markdup mark duplicates

-- File operations collate shuffle and group alignments by name cat concatenate BAMs merge merge sorted alignments mpileup multi-way pileup sort sort alignment file split splits a file by read group quickcheck quickly check if SAM/BAM/CRAM file appears intact fastq converts a BAM to a FASTQ fasta converts a BAM to a FASTA

-- Statistics bedcov read depth per BED region coverage alignment depth and percent coverage depth compute the depth flagstat simple stats idxstats BAM index stats phase phase heterozygotes stats generate stats (former bamcheck)

-- Viewing flags explain BAM flags tview text alignment viewer view SAM<->BAM<->CRAM conversion depad convert padded BAM to unpadded BAM

[mipgen] first line of reference fasta reads: >1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 [mipgen] no masked feature file found [mipgen] regions ready; accessing snp file

Version: 1.10.2-3 Usage: tabix [OPTIONS] [FILE] [REGION [...]]

Indexing Options: -0, --zero-based coordinates are zero-based -b, --begin INT column number for region start [4] -c, --comment CHAR skip comment lines starting with CHAR [null] -C, --csi generate CSI index for VCF (default is TBI) -e, --end INT column number for region end (if no end, set INT to -b) [5] -f, --force overwrite existing index without asking -m, --min-shift INT set minimal interval size for CSI indices to 2^INT [14] -p, --preset STR gff, bed, sam, vcf -s, --sequence INT column number for sequence names (suppressed by -p) [1] -S, --skip-lines INT skip first INT lines [0]

Querying and other options: -h, --print-header print also the header lines -H, --only-header print only the header lines -l, --list-chroms list chromosome names -r, --reheader FILE replace the header with the content of FILE -R, --regions FILE restrict to regions listed in the file -T, --targets FILE similar to -R but streams rather than index-jumps -D do not download the index file

[mipgen] all 0 snps loaded; generating files for bwa [bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9 [bwa_aln_core] calculate SA coordinate... 0.87 sec [bwa_aln_core] write to the disk... 0.00 sec [bwa_aln_core] 19893 sequences have been processed. [main] Version: 0.7.17-r1188 [main] CMD: bwa aln -t 1 ./human_g1k_v37.fasta practice_design.all_sequences.fq [main] Real time: 2.113 sec; CPU: 1.683 sec [bwa_aln_core] convert to sequence coordinate... 1.25 sec [bwa_aln_core] refine gapped alignments... 0.25 sec [bwa_aln_core] print alignments... 0.05 sec [bwa_aln_core] 19893 sequences have been processed. [main] Version: 0.7.17-r1188 [main] CMD: bwa samse ./human_g1k_v37.fasta practice_design.all_sequences.sai practice_design.all_sequences.fq [main] Real time: 1.820 sec; CPU: 1.567 sec [mipgen] sam file opened [bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9 [bwa_aln_core] calculate SA coordinate... 7.95 sec [bwa_aln_core] write to the disk... 0.05 sec [bwa_aln_core] 262144 sequences have been processed. [bwa_aln_core] calculate SA coordinate... 4.95 sec [bwa_aln_core] write to the disk... 0.03 sec [bwa_aln_core] 416934 sequences have been processed. [main] Version: 0.7.17-r1188 [main] CMD: bwa aln -t 1 ./human_g1k_v37.fasta practice_design.oligo_copy_count.fq [main] Real time: 14.671 sec; CPU: 13.982 sec [bwa_aln_core] convert to sequence coordinate... 1.69 sec [bwa_aln_core] refine gapped alignments... 0.32 sec [bwa_aln_core] print alignments... 0.58 sec [bwa_aln_core] 262144 sequences have been processed. [bwa_aln_core] convert to sequence coordinate... 1.52 sec [bwa_aln_core] refine gapped alignments... 0.28 sec [bwa_aln_core] print alignments... 0.35 sec [bwa_aln_core] 416934 sequences have been processed. [main] Version: 0.7.17-r1188 [main] CMD: bwa samse ./human_g1k_v37.fasta practice_design.oligo_copy_count.sai practice_design.oligo_copy_count.fq [main] Real time: 6.006 sec; CPU: 4.947 sec [mipgen] checking oligo copy [mipgen] bwa copy number analysis finished [mipgen] file of all mips ready for write: practice_design.all_mips.txt [mipgen] file of collapsed mips ready for write: practice_design.collapsed_mips.txt

Thanks,

Katrin

augustboyle commented 4 years ago

Hi Katrin,

I’ve been thinking about what could be the problem. Have you used your installed BWA and tabix and downloaded reference previously? I haven’t seen many issues running the practice commands reported, so it would be good to check that nothing was awry with the dependencies, such as a partially downloaded genome.

Evan

On May 18, 2020 at 8:11:17 AM, biokatrin (notifications@github.com) wrote:

Hi, I have run the mipgen_practice with the example commands but I don't get any logistic scores, only -nan values. I have read the previous posts regarding log scores /nan values, but the error messages do not apply to me. I also have problems loading the snp vcf file. There is no error message, but no snps are loaded, and not snp mips produced. Can someone help, please?

This is my output:

~$ MIPGEN/mipgen -regions_to_scan mipgen_practice/practice_genes.bed -project_name practice_design -min_capture_size 162 -max_capture_size 162 -snp_file ./practice_design.local_snp_data.vcf -bwa_genome_index ./human_g1k_v37.fasta

Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.17-r1188 Contact: Heng Li lh3@sanger.ac.uk

Usage: bwa [options]

Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries

 shm           manage indices in shared memory
 fa2pac        convert FASTA to PAC format
 pac2bwt       generate BWT from PAC
 pac2bwtgen    alternative algorithm for generating BWT
 bwtupdate     update .bwt to the new format
 bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with bwa index'. There are three alignment algorithms in BWA: mem', bwasw', and aln/samse/sampe'. If you are not sure which to use, try bwa mem' first. Please man ./bwa.1' for the manual.

[mipgen] success on opening region file [mipgen] features loaded; retrieving chromosomal sequence

Program: samtools (Tools for alignments in the SAM format) Version: 1.10 (using htslib 1.10.2-3)

Usage: samtools [options]

Commands: -- Indexing dict create a sequence dictionary file faidx index/extract FASTA fqidx index/extract FASTQ index index alignment

-- Editing calmd recalculate MD/NM tags and '=' bases fixmate fix mate information reheader replace BAM header targetcut cut fosmid regions (for fosmid pool only) addreplacerg adds or replaces RG tags markdup mark duplicates

-- File operations collate shuffle and group alignments by name cat concatenate BAMs merge merge sorted alignments mpileup multi-way pileup sort sort alignment file split splits a file by read group quickcheck quickly check if SAM/BAM/CRAM file appears intact fastq converts a BAM to a FASTQ fasta converts a BAM to a FASTA

-- Statistics bedcov read depth per BED region coverage alignment depth and percent coverage depth compute the depth flagstat simple stats idxstats BAM index stats phase phase heterozygotes stats generate stats (former bamcheck)

-- Viewing flags explain BAM flags tview text alignment viewer view SAM<->BAM<->CRAM conversion depad convert padded BAM to unpadded BAM

[mipgen] first line of reference fasta reads: >1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 [mipgen] no masked feature file found [mipgen] regions ready; accessing snp file

Version: 1.10.2-3 Usage: tabix [OPTIONS] [FILE] [REGION [...]]

Indexing Options: -0, --zero-based coordinates are zero-based -b, --begin INT column number for region start [4] -c, --comment CHAR skip comment lines starting with CHAR [null] -C, --csi generate CSI index for VCF (default is TBI) -e, --end INT column number for region end (if no end, set INT to -b) [5] -f, --force overwrite existing index without asking -m, --min-shift INT set minimal interval size for CSI indices to 2^INT [14] -p, --preset STR gff, bed, sam, vcf -s, --sequence INT column number for sequence names (suppressed by -p) [1] -S, --skip-lines INT skip first INT lines [0]

Querying and other options: -h, --print-header print also the header lines -H, --only-header print only the header lines -l, --list-chroms list chromosome names -r, --reheader FILE replace the header with the content of FILE -R, --regions FILE restrict to regions listed in the file -T, --targets FILE similar to -R but streams rather than index-jumps -D do not download the index file

[mipgen] all 0 snps loaded; generating files for bwa [bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9 [bwa_aln_core] calculate SA coordinate... 0.87 sec [bwa_aln_core] write to the disk... 0.00 sec [bwa_aln_core] 19893 sequences have been processed. [main] Version: 0.7.17-r1188 [main] CMD: bwa aln -t 1 ./human_g1k_v37.fasta practice_design.all_sequences.fq [main] Real time: 2.113 sec; CPU: 1.683 sec [bwa_aln_core] convert to sequence coordinate... 1.25 sec [bwa_aln_core] refine gapped alignments... 0.25 sec [bwa_aln_core] print alignments... 0.05 sec [bwa_aln_core] 19893 sequences have been processed. [main] Version: 0.7.17-r1188 [main] CMD: bwa samse ./human_g1k_v37.fasta practice_design.all_sequences.sai practice_design.all_sequences.fq [main] Real time: 1.820 sec; CPU: 1.567 sec [mipgen] sam file opened [bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9 [bwa_aln_core] calculate SA coordinate... 7.95 sec [bwa_aln_core] write to the disk... 0.05 sec [bwa_aln_core] 262144 sequences have been processed. [bwa_aln_core] calculate SA coordinate... 4.95 sec [bwa_aln_core] write to the disk... 0.03 sec [bwa_aln_core] 416934 sequences have been processed. [main] Version: 0.7.17-r1188 [main] CMD: bwa aln -t 1 ./human_g1k_v37.fasta practice_design.oligo_copy_count.fq [main] Real time: 14.671 sec; CPU: 13.982 sec [bwa_aln_core] convert to sequence coordinate... 1.69 sec [bwa_aln_core] refine gapped alignments... 0.32 sec [bwa_aln_core] print alignments... 0.58 sec [bwa_aln_core] 262144 sequences have been processed. [bwa_aln_core] convert to sequence coordinate... 1.52 sec [bwa_aln_core] refine gapped alignments... 0.28 sec [bwa_aln_core] print alignments... 0.35 sec [bwa_aln_core] 416934 sequences have been processed. [main] Version: 0.7.17-r1188 [main] CMD: bwa samse ./human_g1k_v37.fasta practice_design.oligo_copy_count.sai practice_design.oligo_copy_count.fq [main] Real time: 6.006 sec; CPU: 4.947 sec [mipgen] checking oligo copy [mipgen] bwa copy number analysis finished [mipgen] file of all mips ready for write: practice_design.all_mips.txt [mipgen] file of collapsed mips ready for write: practice_design.collapsed_mips.txt

Thanks,

Katrin

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/shendurelab/MIPGEN/issues/42, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOTLJ6SGX6FW5KBOROEOYTRSFFZJANCNFSM4NEFNVDA .

biokatrin commented 4 years ago

Dear Evan,

thanks very much for your reply. I tried downloading the reference genome again from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz, and ran gunzip and bwa index again. I also followed the instructions in issue #10 by Janrehker and now the practice genes work. I get nan scores and snp alternatives.

B.w.

Katrin

augustboyle commented 4 years ago

Are you saying the practice command generates scores, but your regions do not?

If that is the case I would check the regions being queried from the fasta and make sure the BED regions are correct (I have misassigned chromosome before, which can make BED files point to N or nonexistent sequence).

Let me know if you need more help troubleshooting, Evan

On June 8, 2020 at 2:28:49 AM, biokatrin (notifications@github.com) wrote:

Dear Evan,

thanks very much for your reply. I tried downloading the reference genome again from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz, and ran gunzip and bwa index again. I also followed the instructions in issue #10 https://github.com/shendurelab/MIPGEN/issues/10 by Janrehker and now the practice genes work. I get nan scores and snp alternatives.

B.w.

Katrin

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/shendurelab/MIPGEN/issues/42#issuecomment-640486586, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOTLJZFKEUOMUFQJCX54HLRVSVNBANCNFSM4NEFNVDA .