wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
432 stars 47 forks source link

got an unexpected keyword argument 'keep_supp' #185

Closed aspitaleri closed 4 years ago

aspitaleri commented 4 years ago

Hi I got NanoPlot crashing using the following:

NanoPlot --summary sequencing_summary.txt --loglength -o summary_barc --barcoded --N50

If you read this then NanoPlot 1.30.0 has crashed :-( Please try updating NanoPlot and see if that helps... If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues If you could include the log file that would be really helpful. Thanks! Traceback (most recent call last): File "/usr/local/bin/NanoPlot", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/nanoplot/NanoPlot.py", line 55, in main datadf = get_input( TypeError: get_input() got an unexpected keyword argument 'keep_supp'

No idea why. Up last week was working properly. Crash also with NanoPlot --summary sequencing_summary.txt only

Any suggestions? Thanks

aspitaleri commented 4 years ago

comment in function get_input() keep_supp it works.

wdecoster commented 4 years ago

It seems your nanoget module is a bit older than NanoPlot itself, oddly enough. Updating nanoget should fix this issue. Did you install with conda or pip?

aspitaleri commented 4 years ago

I did with pip3 using --upgrade sudo pip3 install NanoPlot --upgrade I upgraded nanoget with pip3 and now it works. I thought that pip --upgrade should manage that. One quick question about log transformation (not clear, sorry). I read #108 and it is not clear to me how to interpret the two graphs, linear and log transformed. In the x-axis of the log transformed I was expecting 10^0, 10^1, 10^2 and so on. In few words, is it correct to get the mean read length from the log transformed? I am bit confused from the NanoPlot graphs. Best tool! Thanks

wdecoster commented 4 years ago

In few words, is it correct to get the mean read length from the log transformed?

If you need the mean read length then you should just get that number from the NanoStats file, which mentions the mean read length (and median). The calculation is done before log transformation.

The linear and the log-transformed graph show the same data, but due to the long-tailed read length distribution you commonly get from nanopore sequencing it's hard to fit the longest read together with the rest of the reads on a plot. Log transforming the data can help.

aspitaleri commented 4 years ago

Okay I see, but something in my output does not make sense. This is the statistic:

Active channel | 430 Mean read length | 1287.3 Mean read quality | 8.4 Median read length | 322 Median read quality | 8.5 Number of reads | 131598 Read length N50 | 4988 Total bases | 169409744

but the plots are not showing very well those data: barcode02_LogTransformed_HistogramReadlength

My doubt is: in your plot, is the read length from the whole read or it is the lengths chunk of the read? I mean, i.e. read1 long 1200 base, the histogram will be bin count 0-100 12 101-200 6 201-300 4 301-400 3 401-500 2.2 501-600 2 ... 1200 1

is this the rational of your plots? I was expecting to have for read1 this: bin count 0-100 0 101-200 0 201-300 0 301-400 0 401-500 0 501-600 0 ... 1200 1

wdecoster commented 4 years ago

Separate questions would fit better in a separate issue.

A read of length 1200 will end up in the bin of 1200, and not in other bins. For a plot with "number of reads" that read will count as "1" read. For a "weighted" plot with "number of bases" that read will count for 1200 bases within the bin for 1200.