jason-weirather / AlignQC

Long read alignment analysis. Generate a reports on sequence alignments for mappability vs read sizes, error patterns, annotations and rarefraction curve analysis. The most basic analysis only requires a BAM file, and outputs a web browser compatible xhtml to visualize/share/store/extract analysis results.
Apache License 2.0
45 stars 10 forks source link

ValueError #7

Closed rojinsafavi closed 6 years ago

rojinsafavi commented 6 years ago

Hello, I want to use ailgnQC to analyze some nanopore RNA data, but I keep getting this allocation error:

alignqc analyze aln.bam -g ../Mus_musculus.GRCm38.cdna.all.fa -t ../UCSC_Main_on_Mouse__all_mrna.gtf.gz -o report.xhtml --portable_output report.portable.xhtml --threads 10

Exception in thread Thread-4:ext coverage Traceback (most recent call last): File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 754, in run self.target(*self.args, **self.__kwargs) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers pool._maintain_pool() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool self._repopulate_pool() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool w.start() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

Exception in thread Thread-1: Traceback (most recent call last): File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 754, in run self.target(*self.args, **self.__kwargs) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers pool._maintain_pool() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool self._repopulate_pool() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool w.start() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

Killedignments, 3213 min context coverage

I would really appreciate if you can help me with that

rojinsafavi commented 6 years ago

Okay, I increased the threads and now I'm not getting allocate error anymore, but I get this:

-bash-4.2$ alignqc analyze aln.bam -g ../Mus_musculus.GRCm38.cdna.all.fa -t ../UCSC_Main_on_Mouse__all_mrna.gtf.gz -o report.xhtml --portable_output report.portable.xhtml --threads 10 Using Rscript version: R scripting front-end version 3.3.2 (2016-10-31) Creating initial alignment mapping data /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_preprocess.py aln.bam --minimum_intron_size 68 -o /tmp/weirathe.WRVHE9/temp/alndata.txt.gz --threads 10 --specific_tempdir /tmp/weirathe.WRVHE9/temp/ read basics

check for best set 0/215
combining results 215
Traverse bam for alignment analysis /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/traverse_preprocessed.py /tmp/weirathe.WRVHE9/temp/alndata.txt.gz -o /tmp/weirathe.WRVHE9/data/ --specific_tempdir /tmp/weirathe.WRVHE9/temp/ --threads 10 --min_aligned_bases 50 --max_query_overlap 10 --max_target_overlap 10 --max_target_gap 500000 --required_fractional_improvement 0.2 215 alignments 100 reads
Writing chromosome lengths from header /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_to_chr_lengths.py aln.bam -o /tmp/weirathe.WRVHE9/data/chrlens.txt Can we find any known read types /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/get_platform_report.py /tmp/weirathe.WRVHE9/data/lengths.txt.gz /tmp/weirathe.WRVHE9/data/special_report Go through genepred best alignments and make a bed depth file Generate the depth bed for the mapped reads gpd_to_bed_depth.py /tmp/weirathe.WRVHE9/data/best.sorted.gpd.gz -o /tmp/weirathe.WRVHE9/data/depth.sorted.bed.gz --threads 10 Stratify the depth to make it plot quicker and cleaner

Get ready for alignment plot Make alignment plots /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/make_alignment_plot.py /tmp/weirathe.WRVHE9/data/lengths.txt.gz --rscript_path Rscript --output_stats /tmp/weirathe.WRVHE9/data/alignment_stats.txt --output /tmp/weirathe.WRVHE9/plots/alignments.png /tmp/weirathe.WRVHE9/plots/alignments.pdf making plot Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_gapped_alignment_statistics.r /tmp/weirathe.WRVHE9/data/lengths.txt.gz /tmp/weirathe.WRVHE9/plots/alignments.png null device 1 Warning messages: 1: In png(infile, bg = "#FFFFFF") : unable to load shared object '/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/R/library/grDevices/libs//cairo.so': libjpeg.so.8: cannot open shared object file: No such file or directory 2: In png(infile, bg = "#FFFFFF") : failed to load cairo DLL 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_gapped_alignment_statistics.r /tmp/weirathe.WRVHE9/data/lengths.txt.gz /tmp/weirathe.WRVHE9/plots/alignments.pdf null device 1 Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf Finished. Making depth reports /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/depth_to_coverage_report.py /tmp/weirathe.WRVHE9/data/depth.sorted.bed.gz /tmp/weirathe.WRVHE9/data/chrlens.txt -o /tmp/weirathe.WRVHE9/data 203852380 87887 Making coverage plots Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_chr_depth.r /tmp/weirathe.WRVHE9/data/line_plot_table.txt.gz /tmp/weirathe.WRVHE9/data/total_distro_table.txt.gz /tmp/weirathe.WRVHE9/data/chr_distro_table.txt.gz /tmp/weirathe.WRVHE9/plots/covgraph.png Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_chr_depth.r /tmp/weirathe.WRVHE9/data/line_plot_table.txt.gz /tmp/weirathe.WRVHE9/data/total_distro_table.txt.gz /tmp/weirathe.WRVHE9/data/chr_distro_table.txt.gz /tmp/weirathe.WRVHE9/plots/covgraph.pdf Making chr depth plots Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_depthmap.r /tmp/weirathe.WRVHE9/temp/depth.coverage-strata.sorted.bed.gz /tmp/weirathe.WRVHE9/data/chrlens.txt /tmp/weirathe.WRVHE9/temp/coverage-strata.key /tmp/weirathe.WRVHE9/plots/perchrdepth.png Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_depthmap.r /tmp/weirathe.WRVHE9/temp/depth.coverage-strata.sorted.bed.gz /tmp/weirathe.WRVHE9/data/chrlens.txt /tmp/weirathe.WRVHE9/temp/coverage-strata.key /tmp/weirathe.WRVHE9/plots/perchrdepth.pdf Get the exon distributions /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/gpd_to_exon_distro.py /tmp/weirathe.WRVHE9/data/best.sorted.gpd.gz -o /tmp/weirathe.WRVHE9/data/exon_size_distro.txt.gz --threads 10 Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_exon_distro.r /tmp/weirathe.WRVHE9/data/exon_size_distro.txt.gz /tmp/weirathe.WRVHE9/plots/exon_size_distro.png Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_exon_distro.r /tmp/weirathe.WRVHE9/data/exon_size_distro.txt.gz /tmp/weirathe.WRVHE9/plots/exon_size_distro.pdf Make a UCSC genome browser compatible bed file gpd_to_UCSC_bed12.py --headername aln.bam:best /tmp/weirathe.WRVHE9/data/best.sorted.gpd.gz -o /tmp/weirathe.WRVHE9/data/best.sorted.bed.gz --color red gpd_to_UCSC_bed12.py --headername aln.bam:trans-chimera /tmp/weirathe.WRVHE9/data/chimera.gpd.gz -o /tmp/weirathe.WRVHE9/data/chimera.bed.gz --color blue gpd_to_UCSC_bed12.py --headername aln.bam:gapped /tmp/weirathe.WRVHE9/data/gapped.gpd.gz -o /tmp/weirathe.WRVHE9/data/gapped.bed.gz --color orange gpd_to_UCSC_bed12.py --headername aln.bam:self-chimera /tmp/weirathe.WRVHE9/data/technical_chimeras.gpd.gz -o /tmp/weirathe.WRVHE9/data/technical_chimeras.bed.gz --color green gpd_to_UCSC_bed12.py --headername aln.bam:self-atypical /tmp/weirathe.WRVHE9/data/technical_atypical_chimeras.gpd.gz -o /tmp/weirathe.WRVHE9/data/technical_atypical_chimeras.bed.gz --color purple Making context plot /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_to_context_error_plot.py aln.bam -r ../Mus_musculus.GRCm38.cdna.all.fa --target --output_raw /tmp/weirathe.WRVHE9/data/context_error_data.txt -o /tmp/weirathe.WRVHE9/plots/context_plot.png /tmp/weirathe.WRVHE9/plots/context_plot.pdf --rscript_path Rscript --random --specific_tempdir /tmp/weirathe.WRVHE9/temp --stopping_point 5000 --input_index /tmp/weirathe.WRVHE9/temp/myindex.bgi Reading reference fasta Reading index 407 alignments, 2187 min context coverage 476 alignments, 2187 min context coverage

Killed -bash-4.2$

and the only output that I see in my directory is Rplots.pdf

jason-weirather commented 6 years ago

Hi @rojinsafavi Sorry for the delay in my response. My multithreading in AlignQC is all-in-all not fantastic. You get reasonable speed ups for a few segments of the pipeline, but the memory requirements skyrocket because I'm not using any shared memory optimizations. I recommend either running as a single thread or running on a very high memory computer with more threads, but going up to more and more threads will only make the memory issues worse. When you run a single thread I try to avoid using multiprocessing calls so the error logs are little more meaningful too. Sometimes if a bug or input error occurs with multiprocessing it can be hard to get the actual error message. Can you try running on a single thread and see if you still have a problem?

rojinsafavi commented 6 years ago

Thanks Jason! I will run it on a single thread and will report the result to you if I still get an error

rojinsafavi commented 6 years ago

Hi Jason,

I ran it again with 1 thread, and I got the same error (OSError: [Errno 12] Cannot allocate memory). I have to mention that I'm only testing only 15 fast5 files here ( just for testing purposes).

-bash-4.2$ alignqc analyze trial-fast5/aln.bam -g Mus_musculus.GRCm38.cdna.all.fa -t UCSC_Main_on_Mouse__all_mrna.gtf.gz -o report.xhtml --portable_output report.portable.xhtml --output_folder alignqc-output --threads 1

Using Rscript version: R scripting front-end version 3.3.2 (2016-10-31) Creating initial alignment mapping data /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_preprocess.py trial-fast5/aln.bam --minimum_intron_size 68 -o /tmp/weirathe.3ytJfG/temp/alndata.txt.gz --threads 1 --specific_tempdir /tmp/weirathe.3ytJfG/temp/ read basics

check for best set 0/33
combining results 33
Traverse bam for alignment analysis /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/traverse_preprocessed.py /tmp/weirathe.3ytJfG/temp/alndata.txt.gz -o /tmp/weirathe.3ytJfG/data/ --specific_tempdir /tmp/weirathe.3ytJfG/temp/ --threads 1 --min_aligned_bases 50 --max_query_overlap 10 --max_target_overlap 10 --max_target_gap 500000 --required_fractional_improvement 0.2 33 alignments 15 reads
Writing chromosome lengths from header /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_to_chr_lengths.py trial-fast5/aln.bam -o /tmp/weirathe.3ytJfG/data/chrlens.txt Can we find any known read types /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/get_platform_report.py /tmp/weirathe.3ytJfG/data/lengths.txt.gz /tmp/weirathe.3ytJfG/data/special_report Go through genepred best alignments and make a bed depth file Generate the depth bed for the mapped reads gpd_to_bed_depth.py /tmp/weirathe.3ytJfG/data/best.sorted.gpd.gz -o /tmp/weirathe.3ytJfG/data/depth.sorted.bed.gz --threads 1 Stratify the depth to make it plot quicker and cleaner

Get ready for alignment plot Make alignment plots /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/make_alignment_plot.py /tmp/weirathe.3ytJfG/data/lengths.txt.gz --rscript_path Rscript --output_stats /tmp/weirathe.3ytJfG/data/alignment_stats.txt --output /tmp/weirathe.3ytJfG/plots/alignments.png /tmp/weirathe.3ytJfG/plots/alignments.pdf making plot Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_gapped_alignment_statistics.r /tmp/weirathe.3ytJfG/data/lengths.txt.gz /tmp/weirathe.3ytJfG/plots/alignments.png null device 1 Warning messages: 1: In png(infile, bg = "#FFFFFF") : unable to load shared object '/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/R/library/grDevices/libs//cairo.so': libjpeg.so.8: cannot open shared object file: No such file or directory 2: In png(infile, bg = "#FFFFFF") : failed to load cairo DLL 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 5: In min(x) : no non-missing arguments to min; returning Inf 6: In max(x) : no non-missing arguments to max; returning -Inf 7: In min(x) : no non-missing arguments to min; returning Inf 8: In max(x) : no non-missing arguments to max; returning -Inf Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_gapped_alignment_statistics.r /tmp/weirathe.3ytJfG/data/lengths.txt.gz /tmp/weirathe.3ytJfG/plots/alignments.pdf null device 1 Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 5: In min(x) : no non-missing arguments to min; returning Inf 6: In max(x) : no non-missing arguments to max; returning -Inf Finished. Making depth reports /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/depth_to_coverage_report.py /tmp/weirathe.3ytJfG/data/depth.sorted.bed.gz /tmp/weirathe.3ytJfG/data/chrlens.txt -o /tmp/weirathe.3ytJfG/data 203852380 13150 Making coverage plots Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_chr_depth.r /tmp/weirathe.3ytJfG/data/line_plot_table.txt.gz /tmp/weirathe.3ytJfG/data/total_distro_table.txt.gz /tmp/weirathe.3ytJfG/data/chr_distro_table.txt.gz /tmp/weirathe.3ytJfG/plots/covgraph.png Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_chr_depth.r /tmp/weirathe.3ytJfG/data/line_plot_table.txt.gz /tmp/weirathe.3ytJfG/data/total_distro_table.txt.gz /tmp/weirathe.3ytJfG/data/chr_distro_table.txt.gz /tmp/weirathe.3ytJfG/plots/covgraph.pdf Making chr depth plots Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_depthmap.r /tmp/weirathe.3ytJfG/temp/depth.coverage-strata.sorted.bed.gz /tmp/weirathe.3ytJfG/data/chrlens.txt /tmp/weirathe.3ytJfG/temp/coverage-strata.key /tmp/weirathe.3ytJfG/plots/perchrdepth.png Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_depthmap.r /tmp/weirathe.3ytJfG/temp/depth.coverage-strata.sorted.bed.gz /tmp/weirathe.3ytJfG/data/chrlens.txt /tmp/weirathe.3ytJfG/temp/coverage-strata.key /tmp/weirathe.3ytJfG/plots/perchrdepth.pdf Get the exon distributions /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/gpd_to_exon_distro.py /tmp/weirathe.3ytJfG/data/best.sorted.gpd.gz -o /tmp/weirathe.3ytJfG/data/exon_size_distro.txt.gz --threads 1 Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_exon_distro.r /tmp/weirathe.3ytJfG/data/exon_size_distro.txt.gz /tmp/weirathe.3ytJfG/plots/exon_size_distro.png Rscript /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/plot_exon_distro.r /tmp/weirathe.3ytJfG/data/exon_size_distro.txt.gz /tmp/weirathe.3ytJfG/plots/exon_size_distro.pdf Make a UCSC genome browser compatible bed file gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:best /tmp/weirathe.3ytJfG/data/best.sorted.gpd.gz -o /tmp/weirathe.3ytJfG/data/best.sorted.bed.gz --color red gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:trans-chimera /tmp/weirathe.3ytJfG/data/chimera.gpd.gz -o /tmp/weirathe.3ytJfG/data/chimera.bed.gz --color blue gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:gapped /tmp/weirathe.3ytJfG/data/gapped.gpd.gz -o /tmp/weirathe.3ytJfG/data/gapped.bed.gz --color orange gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:self-chimera /tmp/weirathe.3ytJfG/data/technical_chimeras.gpd.gz -o /tmp/weirathe.3ytJfG/data/technical_chimeras.bed.gz --color green gpd_to_UCSC_bed12.py --headername trial-fast5/aln.bam:self-atypical /tmp/weirathe.3ytJfG/data/technical_atypical_chimeras.gpd.gz -o /tmp/weirathe.3ytJfG/data/technical_atypical_chimeras.bed.gz --color purple Making context plot /projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/bam_to_context_error_plot.py trial-fast5/aln.bam -r Mus_musculus.GRCm38.cdna.all.fa --target --output_raw /tmp/weirathe.3ytJfG/data/context_error_data.txt -o /tmp/weirathe.3ytJfG/plots/context_plot.png /tmp/weirathe.3ytJfG/plots/context_plot.pdf --rscript_path Rscript --random --specific_tempdir /tmp/weirathe.3ytJfG/temp --stopping_point 5000 --input_index /tmp/weirathe.3ytJfG/temp/myindex.bgi Reading reference fasta Reading index 467 alignments, 1972 min context coverage

536 alignments, 2499 min context coverage

546 alignments, 2499 min context coverage

Exception in thread Thread-4:ext coverage Traceback (most recent call last): File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 754, in run self.target(*self.args, **self.__kwargs) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers pool._maintain_pool() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool self._repopulate_pool() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool w.start() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

Exception in thread Thread-1:ext coverage Traceback (most recent call last): File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/threading.py", line 754, in run self.target(*self.args, **self.__kwargs) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers pool._maintain_pool() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool self._repopulate_pool() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool w.start() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

Killedignments, 3072 min context coverage -bash-4.2$ -bash-4.2$ -bash-4.2$

jason-weirather commented 6 years ago

Thanks for the error details. Sampling the context errors can also be a little unfriendly when it comes to memory and that looks like where you are running into trouble. The easiest way to fix this it to sample less.

--context_error_stopping_point 1000

will reduce the depth of sampling that is done (I think default is 2500), but the plot it generates should be pretty representative (unless your error rates are super-low).

rojinsafavi commented 6 years ago

Thanks Jason, I think the main reason that I was getting that error was because I was using toplevel reference genome. I was able to overcome that issue by using gencode.vM16.primary_assembly.annotation.gtf.gz and GRCm38.primary_assembly.genome.fa. But now I'm getting the same error as this issue : https://github.com/jason-weirather/AlignQC/issues/6

Sorting in reference genePred 133000
reading read genepred stream loci Traceback (most recent call last): File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/bin/alignqc", line 11, in load_entry_point('AlignQC==2.0.5', 'console_scripts', 'alignqc')() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/alignqc.py", line 47, in entry_point main(args,operable_argv) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/alignqc.py", line 17, in main analyze.external_cmd(operable_argv,version=version) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/analyze.py", line 88, in external_cmd main(args) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/analyze.py", line 54, in main prepare_all_data.external(args) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/prepare_all_data.py", line 844, in external main(args) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/prepare_all_data.py", line 69, in main make_data_bam_annotation(args) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/prepare_all_data.py", line 725, in make_data_bam_annotation annotated_read_bias_analysis.external_cmd(cmd) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/annotated_read_bias_analysis.py", line 341, in external_cmd main(args) File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/AlignQC-2.0.5-py2.7.egg/alignqc/annotated_read_bias_analysis.py", line 41, in main for l in mls: File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 240, in next r = self.read_entry() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 263, in read_entry try: self._buffers[i] = self._streams[i].next() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 178, in next r = self.read_entry() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 195, in read_entry e = self._stream.next() File "/projects/nanopore-working/rojin/nanoraw-signalAlign-nanopolish/anaconda2/lib/python2.7/site-packages/seq_tools-1.0.10-py2.7.egg/seqtools/stream.py", line 147, in next raise ValueError('Expected lines to be ordered but they appear not to be ordered on line '+str(self._ln)) ValueError: Expected lines to be ordered but they appear not to be ordered on line 133849

since you asked for the GFT file, I'm gonna attach it here:

ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M16/gencode.vM16.primary_assembly.annotation.gtf.gz

rojinsafavi commented 6 years ago

Hi Jason, Any updates on this issue? Kind regards, Rojin

jason-weirather commented 6 years ago

Hi @rojinsafavi Sorry for the delay. I'm a busy these days so if lose track of these I appreciate getting the reminder :) It looks like a problem streaming the data. Is your alignment file sorted by genomic position? If they are ... I have a second more complicated problem that this may be due to. If they are supported by position, do you know if the index of your chromosomes are in alphabetical order? I notice that different aligners have different behaviors when it comes to sorting and sometimes they sort chromosomes alphabetically ... sometimes they do other things. And i may be making the alphabetical assumption in the ordering-check. Something you can try is my sort tool thats in seqtools

seq-tools sort --bam yourbam -o newbam

If this is the cause I may rethink my order check because I don't want to require another sort before running. Thanks for your help in figuring this out.

rojinsafavi commented 6 years ago

Thanks Jason! So I'm using minimap2, which outputs a sam file, and then I use samtools to convert the sam to bam. I attached both sam and bam here. aln.zip I looked at the sam file, and it seems that it is not sorted, I tried sorting the sam file using samtools, and run it again but I got the same error. I also used the command that you gave me, but the new bamfaile gave the same error again

jason-weirather commented 6 years ago

Thanks for posting the file. Problem with samtools sort ... well some would probably consider it a feature not a problem, is that the order of chromosomes in the file is defined by the samfile header and its not actually sorted by any criteria other than the samfile header. So my assumption of order based on alphabet doesn't work if the sam header was sorted otherwise. Since samtools is the gold standard for sam management I'd be better off changing my sorting behavior to fit its convention. In the meantime I'd suggest you use my sorting function I mentioned above. it uses samtools sort to do the sort, but before that, it sorts the header to be sorted. You can see the difference if you do a samtools sort and a seq-tools sort --bam, and then you inspect the outputs with samtools view -H to look at the header. I'll open an issue to change this sorting behavior, but I'm not 100% sure this is causing your problem, so I recommend you resort your file with the seq-tools sort and give it a try for now. Thanks!

jason-weirather commented 6 years ago

Hi @rojinsafavi I did some testing this morning to try to narrow down the problem. What I expected to be an issued turned out not to be. I guess my sorts at the beginning of the run mitigate that problem, so I closed the issue I had opened on that. I used the test files you sent me. ... the mouse transcriptome and the aln.bam, and I was able to generate to run without error both on my mac os computer and from the docker.

With those files in a Test subdirectory the command can be run like so:

docker run -v $(pwd)/Test:/Test -t vacation/alignqc alignqc analyze --no_genome -t /Test/gencode.vM16.primary_assembly.annotation.gtf.gz /Test/aln.bam --specific_tempdir /Test/mytemp3 -o /Test/mytest3.xhtml

Test Data Output

Next steps I would suggest is that you also try running from the Docker, and see if you still get the same error you reported before, with the test data you sent me. If not, I will probably need some test data that I can use to replicate the error, and then I can track down more whats going on. Also open to any suggestion if you think you have any idea what is the cause, but sometimes it can be hard to track down without being able to replicate. Thanks!

jason-weirather commented 6 years ago

Hi @rojinsafavi Just following up because I was looking through my issues for anything else I can help with and I saw another conversation i had where a streaming error occured with genepred .gpd format files as transcriptome references. These happened because my parser does not deal with periods . symbols in the CDS start and stops. If this could be causing you problems, I recommend you configure your transcriptome reference input to be a GTF format. My treatment of GTF format should not have this same problem. I'm going to update the documentation to recommend GTF format to hopefully help anyone else that may encounter this.

rojinsafavi commented 6 years ago

Hi Jason, I installed AlignQC on my mac by :

  1. cloning the git
  2. and then running python setup.py install

and I was able to reproduce the result you made

For some reasons the same thing won't works on our server, and It gave me the error that I sent you. But I was able to save the plots by providing a temp dict ( only for a small subsample)

Our server does not let us use docker installation, I have to check with the admin to see if they can install it on our sever. But I did conda installation and again I got the same error ( but was able to save the plots in the temp dict for that small subsample)

But, I was not able to even save the plots for all my data ( I have about 400,000 fast5 files), how many files do you usually use to get a good approximation? I think I might be using too many files!

Kind regards, Rojin

rojinsafavi commented 6 years ago

Hi Jason!

So I was not able to run alignQC on our linux servers, but I managed to run it on my mac (I installed it through pip), it took a while but it worked with no error. Thanks alot for the help, and hope you have a great holiday! Best, Rojin