wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
419 stars 47 forks source link

Report not generated: error tokenizing data #373

Closed pclavell closed 3 months ago

pclavell commented 3 months ago

Hello I am running Nanoplot on ONT data aligned with minimap2 and I get html files for the plots but the report fails to be generated. I append the command and the log.

NanoPlot \
    -t 112 \
    -o genomic_nanoplot \
    -p ${SAMPLENAME}_nanoplot \
    --bam $INPUT

2024-07-02 12:18:15,967 NanoPlot 1.43.0 started with arguments Namespace(threads=112, verbose=False, store=False, raw=False, huge=False, outdir='genomic_nanoplot', no_static=False, prefix='sample', tsv_stats=False, only_report=False, info_in_report=False, maxlength=None, minlength=None, drop_outliers=False, downsample=None, loglength=False, percentqual=False, alength=False, minqual=None, runtime_until=None, readtype='1D', barcoded=False, no_supplementary=False, color='#4CB391', colormap='Greens', format=['png'], plots=['kde', 'dot'], legacy=None, listcolors=False, listcolormaps=False, no_N50=False, N50=False, title=None, font_scale=1, dpi=100, hide_stats=False, fastq=None, fasta=None, fastq_rich=None, fastq_minimal=None, summary=None, bam=['/sample.bam'], ubam=None, cram=None, pickle=None, feather=None, path='sample') 2024-07-02 12:18:15,967 Python version is: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0] 2024-07-02 12:18:15,985 Nanoget: Starting to collect statistics from bam file sample.bam. 2024-07-02 12:18:16,092 Nanoget: Bam file sample.bam contains 20481891 mapped and 0 unmapped reads. 2024-07-02 12:18:16,092 Nanoget: lots of contigs (>200) or --huge, not running in separate processes 2024-07-02 12:31:37,224 Nanoget: bam sample.bam contains 20481891 primary alignments. 2024-07-02 12:31:44,292 Reduced DataFrame memory usage from 2657.968638420105Mb to 2657.968638420105Mb 2024-07-02 12:31:52,691 Nanoget: Gathered all metrics of 20481891 reads 2024-07-02 12:32:19,098 Calculated statistics 2024-07-02 12:32:19,101 Using sequenced read lengths for plotting. 2024-07-02 12:32:20,667 NanoPlot: Valid color #4CB391. 2024-07-02 12:32:20,667 NanoPlot: Valid colormap Greens. 2024-07-02 12:32:21,748 NanoPlot: Creating length plots for Read length. 2024-07-02 12:32:21,757 NanoPlot: Using 20481891 reads maximum of 8346bp. 2024-07-02 12:32:56,238 No static plots are saved due to some kaleido problem: 2024-07-02 12:32:56,238 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:33:25,709 No static plots are saved due to some kaleido problem: 2024-07-02 12:33:25,709 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:33:51,488 No static plots are saved due to some kaleido problem: 2024-07-02 12:33:51,488 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:34:19,240 No static plots are saved due to some kaleido problem: 2024-07-02 12:34:19,240 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:34:45,907 No static plots are saved due to some kaleido problem: 2024-07-02 12:34:45,907 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:34:45,908 Created length plots 2024-07-02 12:34:47,220 NanoPlot: Creating Read lengths vs Average read quality plots using 20481891 reads. 2024-07-02 12:35:13,101 No static plots are saved due to some kaleido problem: 2024-07-02 12:35:13,104 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:35:37,430 No static plots are saved due to some kaleido problem: 2024-07-02 12:35:37,430 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:35:37,434 Created LengthvsQual plot 2024-07-02 12:35:38,737 NanoPlot: Creating Aligned read lengths vs Sequenced read length plots using 20481891 reads. 2024-07-02 12:36:03,672 No static plots are saved due to some kaleido problem: 2024-07-02 12:36:03,673 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:36:27,415 No static plots are saved due to some kaleido problem: 2024-07-02 12:36:27,415 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:36:27,418 Created AlignedLength vs Length plot. 2024-07-02 12:36:27,418 NanoPlot: Creating Read mapping quality vs Average basecall quality plots using 20481891 reads. 2024-07-02 12:36:52,200 No static plots are saved due to some kaleido problem: 2024-07-02 12:36:52,200 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:37:15,871 No static plots are saved due to some kaleido problem: 2024-07-02 12:37:15,871 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:37:15,871 Created MapQvsBaseQ plot. 2024-07-02 12:37:17,180 NanoPlot: Creating Read length vs Read mapping quality plots using 20481891 reads. 2024-07-02 12:37:41,901 No static plots are saved due to some kaleido problem: 2024-07-02 12:37:41,901 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:38:04,791 No static plots are saved due to some kaleido problem: 2024-07-02 12:38:04,791 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:38:04,793 Created Mapping quality vs read length plot. 2024-07-02 12:38:05,083 NanoPlot: Creating Percent identity vs Average Base Quality plots using 20481891 reads. 2024-07-02 12:38:29,173 No static plots are saved due to some kaleido problem: 2024-07-02 12:38:29,174 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:38:51,008 No static plots are saved due to some kaleido problem: 2024-07-02 12:38:51,008 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:38:51,008 Created Percent ID vs Base quality plot. 2024-07-02 12:38:52,316 NanoPlot: Creating Aligned read length vs Percent identity plots using 20481891 reads. 2024-07-02 12:39:11,933 No static plots are saved due to some kaleido problem: 2024-07-02 12:39:11,934 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:39:28,099 No static plots are saved due to some kaleido problem: 2024-07-02 12:39:28,099 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:39:45,990 No static plots are saved due to some kaleido problem: 2024-07-02 12:39:45,991 Transform failed with error code 1: Failed to serialize document: Uncaught 2024-07-02 12:39:45,991 Created Percent ID vs Length plot 2024-07-02 12:39:45,991 Writing html report. 2024-07-02 12:39:45,993 Error tokenizing data. C error: Expected 2 fields in line 21, saw 3 Traceback (most recent call last): File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/nanoplot/NanoPlot.py", line 111, in main make_report(plots, settings) File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/nanoplot/NanoPlot.py", line 388, in make_report report.html_stats(settings), ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/nanoplot/report.py", line 45, in html_stats stats_html.append(stats2html(statsfile[0])) ^^^^^^^^^^^^^^^^^^^^^^^^ File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/nanoplot/report.py", line 50, in stats2html df = pd.read_csv(statsf, sep=':', header=None, names=['feature', 'value']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv return _read(filepath_or_buffer, kwds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 626, in _read return parser.read(nrows) ^^^^^^^^^^^^^^^^^^ File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1923, in read ) = self._engine.read( # type: ignore[attr-defined] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read chunks = self._reader.read_low_memory(nrows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 21, saw 3

wdecoster commented 3 months ago

This seems the same issue as in https://github.com/wdecoster/nanocomp/issues/76 Was your data run through pychopper? Or are those Duplex reads?

Could you check if --tsv_stats solves this?

pclavell commented 3 months ago

These reads have been basecalled in duplex and have been processed with porechop (not pychopper in this case). I'll try with --tsv_stats and let you know. Thanks for the quick answer!

pclavell commented 3 months ago

It did work enabling --tsv_stats

wdecoster commented 3 months ago

Thanks for the feedback!