wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
413 stars 47 forks source link

guppy5 and barcode crashing #263

Closed aspitaleri closed 3 years ago

aspitaleri commented 3 years ago

Hi @wdecoster Nanoplot is crashing with this error if using --barcoded:

NanoPlot --summary barcoded_sequencing_summary.txt.gz --loglength -o Nanoplot1 --N50 -t 4 --barcoded

/idle/ric.cirillo/envs/spitaleri.andrea/python3-venv/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3334: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /idle/ric.cirillo/envs/spitaleri.andrea/python3-venv/lib/python3.8/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

If you read this then NanoPlot 1.34.0 has crashed :-( Please try updating NanoPlot and see if that helps...

If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues If you could include the log file that would be really helpful. Thanks!

Traceback (most recent call last): File "python3-venv/bin/NanoPlot", line 11, in load_entry_point('NanoPlot==1.34.0', 'console_scripts', 'NanoPlot')() File "python3-venv/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 78, in main settings["statsfile"] = [make_stats(datadf, settings, suffix="", tsv_stats=args.tsv_stats)] File "python3-venv/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 128, in make_stats stats_df = nanomath.write_stats( File "python3-venv/lib/python3.8/site-packages/nanomath/nanomath.py", line 177, in write_stats stats = [Stats(df) for df in datadfs] File "python3-venv/lib/python3.8/site-packages/nanomath/nanomath.py", line 177, in stats = [Stats(df) for df in datadfs] File "python3-venv/lib/python3.8/site-packages/nanomath/nanomath.py", line 39, in init self.n50 = get_N50(np.sort(df["lengths"])) File "python3-venv/lib/python3.8/site-packages/nanomath/nanomath.py", line 118, in get_N50 return readlengths[np.where(np.cumsum(readlengths) >= 0.5 * np.sum(readlengths))[0][0]] IndexError: index 0 is out of bounds for axis 0 with size 0

Version of guppy is: ONT Guppy basecalling software version 5.0.7+2332e8d Version of Nanoplot is 1.34.0 The barcoded_sequencing_summary.txt.gz is from python add_barcodes_to_summary.py sequencing_summary.txt barcoding_summary.txt

With no --barcoded Nanoplot works fine, so Nanocomp too.

Probably issue from guppy5? Previous analysis on the same set using guppy4.4 worked fine.

Thanks

wdecoster commented 3 years ago

Hi,

It should not be related to Guppy 5 - I expect that should just work. Would it be possible to share the summary file? I suspect there might be one or more corrupted lines.

Cheers, Wouter

aspitaleri commented 3 years ago

sure! https://www.dropbox.com/s/1hmnzh3xa5gen5c/barcoded_sequencing_summary.txt.gz?dl=0 anyway also my colleague has the identical problem using the same procedure. I will try to make basecall+barcoding at once with guppy without the python script.

wdecoster commented 3 years ago

I noticed there are reads which do not have a barcode assigned in your summary. So it seems that the files used for add_barcodes_to_summary.py were not entirely overlapping. Was filtering done? Is a subset of the data missing?

aspitaleri commented 3 years ago

I understood. guppy5 generates two dirs, pass and fail, filtering by quality check. That's why. guppy4.4 was not doing like that. The guppy barcoding was done only on the pass fastq files. So I need to remove from summary the fast5 not present in the pass dir, isn't? Or probably faster is to do basecall+barcoding at once, hopefully sequencing_summary should contain only the pass fastq.

https://community.nanoporetech.com/protocols/Guppy-protocol/v/gpb_2003_v1_revx_14dec2018/input-and-output-files

wdecoster commented 3 years ago

Sounds reasonable!

aspitaleri commented 3 years ago

Ok thanks - I will update you if it works

aspitaleri commented 3 years ago

Hi - yes just to confirm that doing basecall+barcoding with guppy5 the sequencing_summary.txt is properly formatted and usable in Nanoplot. Best