wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
413 stars 47 forks source link

Error: expected columns in summary file sequencing_summary.txt not found #246

Closed Kamouyiaraki closed 3 years ago

Kamouyiaraki commented 3 years ago

Hi,

I know this has been asked before, but I can't see to spot where I've gone wrong.

NanoPlot --summary sequencing_summary.txt -o ./Data_pre-processing/Nanoplot_output --readtype 1D --verbose

and I get the error: "ERROR: expected columns in summary file sequencing_summary.txt not found: channel, start_time, duration, sequence_length_template, mean_qscore_template"

Log messages:

2021-03-26 12:58:21,529 NanoPlot 1.20.0 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', cram=None, downsample=None, dpi=100, drop_outliers=False, fasta=None, fastq=None, fastq_minimal=None, fastq_rich=None, font_scale=1, format='png', listcolors=False, loglength=False, maxlength=None, minlength=None, minqual=None, no_N50=False, outdir='./Data_pre-processing/Nanoplot_output', percentqual=False, pickle=None, plots=['kde', 'dot'], prefix='', raw=False, readtype='1D', runtime_until=None, store=False, summary=['sequencing_summary.txt'], threads=4, title=None, verbose=True)
2021-03-26 12:58:21,529 Python version is: 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 21:15:04)  [GCC 7.3.0]
2021-03-26 12:58:21,530 Nanoplotter: valid output format png
2021-03-26 12:58:21,537 Nanoget: Collecting metrics from summary file sequencing_summary.txt for 1D sequencing
2021-03-26 12:58:21,565 Nanoget: did not find expected columns in summary file sequencing_summary.txt:
 channel, start_time, duration, sequence_length_template, mean_qscore_template
ERROR: expected columns in summary file sequencing_summary.txt not found:
 channel, start_time, duration, sequence_length_template, mean_qscore_template

My sequencing_summary.txt:

filename    read_id run_id  batch_id    channel mux start_time  duration    num_events  passes_filtering    template_start  num_events_template template_duration   sequence_length_template    mean_qscore_template    strand_score_template   median_template mad_template    scaling_median_template scaling_mad_template
AGE697_fail_4c0ab159_0.fast5    9ed5a285-a746-44dd-a4d1-4cafb696c7bc    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   45  1   503.870000  1.687250    1349    FALSE   503.870000  1349    1.687250    685 5.6103310.000000    57.465057   9.027743    57.338490   7.942882
AGE697_fail_4c0ab159_0.fast5    698437fd-341d-41d4-8fab-dcac76c7bc6b    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   36  1   1514.505500 1.325500    1060    FALSE   1514.554250 1021    1.276750    670 6.0287040.000000    75.867767   6.423587    74.452835   9.338197
AGE697_fail_4c0ab159_0.fast5    3e7b2095-53a0-4840-8023-f7b41fe204e3    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   1   1   784.680500  1.729750    1383    FALSE   785.005500  1123    1.404750    720 6.5650810.000000    77.430260   8.159691    75.673965   9.417539
AGE697_fail_4c0ab159_0.fast5    33459b3a-21d4-40b3-8ecc-1c179ac2f9c7    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   45  1   284.761500  1.897000    1517    FALSE   284.794000  1491    1.864500    702 5.6061200.000000    52.777576   8.680523    60.021866   8.170839
AGE697_fail_4c0ab159_0.fast5    4d7493f8-7539-48ed-a527-3f6dd6f79049    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   38  1   1056.834750 1.502750    1202    FALSE   1057.123500 971 1.214000    700 5.0095340.000000    78.992752   7.118029    79.510101   9.870745
AGE697_fail_4c0ab159_0.fast5    fe08a001-15c3-4b29-84fa-2340855037e7    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   103 1   309.757750  0.866250    693 FALSE   309.882750  593 0.741250    442 3.6053290.000000    89.582993   8.680523    86.988876   11.047446
AGE697_fail_4c0ab159_0.fast5    9d4b0010-7028-4845-8026-e62ed6a282c0    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   84  1   1851.140750 1.488000    1190    FALSE   1851.178250 1160    1.450500    795 6.2738360.000000    76.214989   7.291639    74.520782   9.073196
AGE697_fail_4c0ab159_0.fast5    f163e69f-6a9a-445f-93da-36b00bf46d39    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   26  1   1182.037500 2.084750    1667    FALSE   1182.080000 1633    2.042250    992 6.6566210.000000    82.117744   11.111069   78.280769   10.462833
AGE697_fail_4c0ab159_0.fast5    94b6af94-0131-4987-a939-dbba78893846    4c0ab1594bd54e5b911727c380df43ab00c0d629    0   82  1   1808.792750 1.199500    959 FALSE   1808.884000 886 1.108250    695 6.3841820.000000    76.562210   7.465249    74.260811   9.685536

Edit: I have a feeling this has to do with the fact that I have basecalled again using guppy and not Albacore. Not sure what the difference in summary files would be and would rather not have to use Albacore in order to use NanoPlot.

Any insights will be greatly appreciated!

wdecoster commented 3 years ago

I'll look at it more in depth later, but guppy should be totally fine. I haven't touched albacore since forever :-)

Kamouyiaraki commented 3 years ago

Thanks I appreciate it! For extra info this was a Flongle run (although I don't know why that would matter) and I updated to NanoPlot 1.35.1 just in case it was version issue but I get the same error message,

wdecoster commented 3 years ago

Do you think it would be possible to share that summary file, or e.g. the first 1000 lines (assuming that subset does replicate the error)?

Kamouyiaraki commented 3 years ago

Annoyingly it did work with the first 1000! But I have to thank you for that, because I then tried to subsample a random 1000 and found that the last 2k lines were all unrecognisable symbols (I'm guessing the file must have been corrupted at some point during the transfer?). Transfered over a new copy from my back up and it's working perfectly. Sorry for the hassle - TIL the usefulness of using tail

wdecoster commented 3 years ago

Hmmm, good to hear the issue is solved! Annoyingly, corrupted files do happen from time to time...