marcus1487 / nanoraw

Genome guided re-segmention and visualization for raw nanopore sequencing data.
https://pypi.python.org/pypi/nanoraw
Other
46 stars 8 forks source link

plot_max_coverage question #33

Closed MikeHala closed 7 years ago

MikeHala commented 7 years ago

Dear Marcus,

Thank you for providing nanoraw!

I have encountered a problem when plotting with plot_max_coverage - the produced plots for the default 10 regions (same happens for a single region) report and depict coverage only on one of the strands (e.g. gi|556503834|ref|NC_000913.3|:− ::: Coverage: 0 + 29 −).

On the other hand, I have extracted (with poretools) the fastq reads from the fast5 files and aligned them to the same reference with bwa-mem and IGV shows a ~50/50 mix of F and R reads at these loci (e.g. 20 reads on F and 23 reads on R strand for example above).

I am using nanoraw v.0.4.1. Here is the command line I used for re-squiggling: nanoraw genome_resquiggle pass/ reference/MG1655.fasta --bwa-mem-executable bwa/0.7.12-r1039/bwa --2d --overwrite --timeout 60 --cpts-limit 100 Failed reads summary: Reached maximum number of changepoints for a single indel : 69 Alignment not produced. : 175 Read sequence from SAM and cooresponding cigar string do not agree. : 2 list index out of range : 1

And the the plotting command line: nanoraw plot_max_coverage --fast5-basedirs pass/ --2d

Your help/suggestions will be greatly appreciated.

Kind regards, Mike

marcus1487 commented 7 years ago

Hi Mike,

Indeed I just became aware of this issue last week. I have fixed it in the plot_genome_location function, but I need to go through each genome plotting function and figure out how to handle each case. I will push a new release when I get all of the plotting functions updated. Hopefully in about a week. For now if you install from github you can at least use the plot_genome_location function as a temporary workaround. Thanks for logging the issue!

MikeHala commented 7 years ago

Hi Marcus,

Thank you for your prompt reply, please let me know when the plotting functions are updated.

I have tried to use the plot_genome_location with nanoraw plot_genome_location --fast5-basedirs pass/ --genome-locations "gi|556503834|ref|NC_000913.3|:3281400" --2d

and I got the following error: Parsing genome locations. Parsing files. Preparing plot data. Traceback (most recent call last): File "/exports/igmm/software/pkg/el7/apps/python/2.7.10/bin/nanoraw", line 9, in load_entry_point('nanoraw==0.4.1', 'console_scripts', 'nanoraw')() File "build/bdist.linux-x86_64/egg/nanoraw/main.py", line 103, in main File "build/bdist.linux-x86_64/egg/nanoraw/plot_commands.py", line 1791, in genome_loc_main File "build/bdist.linux-x86_64/egg/nanoraw/plot_commands.py", line 1188, in plot_genome_locations File "build/bdist.linux-x86_64/egg/nanoraw/plot_commands.py", line 870, in plot_single_sample TypeError: cannot concatenate 'str' and 'NoneType' objects

I am using python/2.7.10, hdf5/1.8.16, and R/3.3.0

marcus1487 commented 7 years ago

Hi Mike,

I had fixed this bug in the two sample plotting code, but not in the single sample plotting function. Just made the change and pushed (03b5a92). I will keep this issue open until I fix this for all genome-based plotting functions, but you should be able to run the genome locations sub-command for now.

marcus1487 commented 7 years ago

Closed this issue. Now genome location and max coverage plot forward and reverse strand along with correct coverage counts. Motif, max diff, and statistical centered plotting functions only plot the strand with the selected criterion and report coverage only for the strand.

Motif plotting is currently only searching plus strand. Need to fix that along with the moitf_with_stats function, but this issue is fixed. Opening new issue with this strand issue.