wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
419 stars 47 forks source link

Negative values in Weighted Histograms #351

Closed LiaOb21 closed 9 months ago

LiaOb21 commented 9 months ago

Dear @wdecoster,

Sorry for creating another issue about weighted histograms, but my situation seems different from the previous ones.

I got strange results for Weighted Histogram of read lengths and Weighted Histogram of read lengths after log transformation, where I see negative values of which I don't understand the meaning:

image

image

Do you have any idea about why this can be happening? I must say that my input file is quite large (69G), not sure if it's enough to use the --huge flag, which I didn't use.

Here the command I used:

NanoPlot -t 20 --fastq {input} --loglength -o results/nanoplot --plots dot --verbose

And the log:

2024-01-08 02:01:37,859 NanoPlot 1.20.0 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', cram=None, downsample=None, dpi=100, drop_outliers=False, fasta=None, fastq=['results/reads/hifi/hifi.fastq.gz'], fastq_minimal=None, fastq_rich=None, font_scale=1, format='png', listcolors=False, loglength=True, maxlength=None, minlength=None, minqual=None, no_N50=False, outdir='results/nanoplot', percentqual=False, pickle=None, plots=['dot'], prefix='', raw=False, readtype='1D', runtime_until=None, store=False, summary=None, threads=20, title=None, verbose=True)
2024-01-08 02:01:37,861 Python version is: 3.6.15 | packaged by conda-forge | (default, Dec  3 2021, 18:49:41)  [GCC 9.4.0]
2024-01-08 02:01:37,886 Nanoplotter: valid output format png
2024-01-08 02:01:37,920 Nanoget: Starting to collect statistics from plain fastq file.
2024-01-08 02:01:37,921 Nanoget: Decompressing gzipped fastq results/reads/hifi/hifi.fastq.gz
2024-01-08 05:23:54,103 Reduced DataFrame memory usage from 200.6367416381836Mb to 133.75782775878906Mb
2024-01-08 05:23:55,014 Nanoget: Gathered all metrics of 8765953 reads
2024-01-08 05:24:00,803 Calculated statistics
2024-01-08 05:24:00,807 Using sequenced read lengths for plotting.
2024-01-08 05:24:01,032 Using log10 scaled read lengths.
2024-01-08 05:24:01,467 Nanoplotter: Valid color #4CB391.
2024-01-08 05:24:01,782 Nanoplotter: Creating length plots for Read length.
2024-01-08 05:24:01,826 Nanoplotter: Using 8765953 reads maximum of 54893bp.
2024-01-08 05:24:58,311 Created length plots
2024-01-08 05:24:58,901 Nanoplotter: Creating Read lengths vs Average read quality plots using statistics from 8765953 reads.
2024-01-08 05:25:47,860 Created LengthvsQual plot
2024-01-08 05:25:47,860 Writing html report.
2024-01-08 05:27:03,607 Finished!

The version I'm using is NanoPlot 1.20.0 installed via bioconda.

Thank you so much in advance!

wdecoster commented 9 months ago

That looks very weird indeed! Is v1.20.0 the one you get if you install from conda? Because there should be a v1.42.0.

LiaOb21 commented 9 months ago

Thank you so much for the quick reply! I'm trying the same command with v1.42.0 (for some reason, I asked conda to install v1.20, my bad). I'll let you know as soon as I get the results

LiaOb21 commented 9 months ago

Hi @wdecoster,

Using the right version I obtained normal results. Thank you!

wdecoster commented 9 months ago

Good to hear!