wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
429 stars 47 forks source link

NanoPlot Crash #110

Closed tmmurtha closed 5 years ago

tmmurtha commented 5 years ago

Hi! I keep getting this error message. Please advise what the issue might be. Thanks!

If you read this then NanoPlot has crashed :-( Please report this issue at https://github.com/wdecoster/NanoPlot/issues If you include the log file that would be really helpful. Thanks!

Traceback (most recent call last): File "/home/grid/miniconda3/bin/NanoPlot", line 11, in sys.exit(main()) File "/home/grid/miniconda3/lib/python3.5/site-packages/nanoplot/NanoPlot.py", line 63, in main barcoded=args.barcoded) File "/home/grid/miniconda3/lib/python3.5/site-packages/nanoget/nanoget.py", line 78, in get_input datadf.drop("readIDs", inplace=True) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 3697, in drop errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 3111, in drop obj = obj._drop_axis(labels, axis, level=level, errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 3143, in _drop_axis new_axis = axis.drop(labels, errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 4404, in drop '{} not found in axis'.format(labels[mask])) KeyError: "['readIDs'] not found in axis"

wdecoster commented 5 years ago

Thanks for reporting this. The error message asks you to include the log file, but since you didn't do that let me ask some questions:

Cheers, Wouter

tmmurtha commented 5 years ago

Wouter, So sorry about not including the log file. here that is: 2018-11-19 07:04:26,139 NanoPlot 1.18.2 started with arguments Namespace(N50=True, alength=False, bam=None, barcoded=False, color='#4CB391', cram=None, downsample=None, dpi=100, drop_outliers=False, fasta=None, fastq=None, fastq_minimal=None, fastq_rich=None, font_scale=1, format='png', listcolors=False, loglength=False, maxlength=None, minlength=None, minqual=None, no_N50=False, outdir='.', percentqual=False, pickle=None, plots=['kde', 'dot'], prefix='', raw=False, readtype='1D', runtime_until=None, store=False, summary=['concat_summary.txt'], threads=5, title=None, verbose=False) 2018-11-19 07:04:26,139 Python version is: 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 21:41:56) [GCC 7.3.0] 2018-11-19 07:04:26,141 Nanoplotter: valid output format png 2018-11-19 07:04:26,145 Nanoget: Collecting metrics from summary file concat_summary.txt for 1D sequencing 2018-11-19 07:04:28,819 Nanoget: Finished collecting statistics from summary file concat_summary.txt 2018-11-19 07:04:29,368 Reduced DataFrame memory usage from 355.05402755737305Mb to 182.0026569366455Mb 2018-11-19 07:04:30,184 "['readIDs'] not found in axis" Traceback (most recent call last): File "/home/grid/miniconda3/lib/python3.5/site-packages/nanoplot/NanoPlot.py", line 63, in main barcoded=args.barcoded) File "/home/grid/miniconda3/lib/python3.5/site-packages/nanoget/nanoget.py", line 78, in get_input datadf.drop("readIDs", inplace=True) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 3697, in drop errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 3111, in drop obj = obj._drop_axis(labels, axis, level=level, errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 3143, in _drop_axis new_axis = axis.drop(labels, errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 4404, in drop '{} not found in axis'.format(labels[mask])) KeyError: "['readIDs'] not found in axis"

My Nanopore Rep installed NanoPlot remotely for me. I honestly don't know the version. Here is the command line I was using : NanoPlot -t 4 --summary concat_summary.txt --N50 -p experiment_name

NanoPlot worked just fine for my previous GridION runs. It's just not working for my most recent ones. I appreciate your help and time so much

Tanya

wdecoster commented 5 years ago

The log file contains the version: v1.18.2. That's not terribly old but could explain the issue.

Let's figure out how it was installed: What do you get if you run: pip list | grep -i nanoplot and conda list | grep -i nanoplot

=> If it was installed with conda we should upgrade with conda, and if it's installed using pip we'll update it using pip.

Let's get the version of nanoget: python -c "import nanoget ; print(nanoget.__version__)"

Wouter

tmmurtha commented 5 years ago

Wouter, Here are those versions for you.

grid@GXB01209:~$ pip list | grep -i nanoplot
NanoPlot         1.18.2   
nanoplotter      1.0.0    
grid@GXB01209:~$ conda list | grep -i nanoplot
nanoplot                  1.18.2                   py35_1    bioconda
nanoplotter               1.0.0                    py35_2    bioconda

grid@GXB01209:~$ python -c "import nanoget ; print(nanoget.__version__)"
1.7.4
wdecoster commented 5 years ago

Oh this confuses me. Sorry, another question. Can you show me the output of

grep -C3 "datadf.drop" /home/grid/miniconda3/lib/python3.5/site-packages/nanoget/nanoget.py?

Thanks, Wouter

tmmurtha commented 5 years ago
grid@GXB01209:~$ grep -C3 "datadf.drop" /home/grid/miniconda3/lib/python3.5/site-packages/nanoget/nanoget.py
            names=names or files,
            method=combine)
    if "readIDs" in datadf and pd.isna(datadf["readIDs"]).any():
        datadf.drop("readIDs", inplace=True)
    datadf = calculate_start_time(datadf)
    logging.info("Nanoget: Gathered all metrics of {} reads".format(len(datadf)))
    if len(datadf) == 0:
tmmurtha commented 5 years ago

I'm able to use NanoPlot for the individual files, but it just doesn't seem to work when I concatenate them. Here is the script I'm using to concatenate, which has worked for me for the last ~10 runs.

To concatenate sequencing_summary.txt files: source activate py2 python ~/scripts/concat_summary.py

wdecoster commented 5 years ago

I have no idea how the concat_summary.py script looks like and who wrote it, or what it does. I don't know if you can show me, bu can you tell me if things are different if you use this method for concatenating?

awk 'FNR>1 || NR==1' sequencing_summary*.txt > concatenated_summary.txt

wdecoster commented 5 years ago

There is also absolutely no reason to concatenate the summaries for NanoPlot, as you can just do

NanoPlot --summary sequencing_summary*.txt

tmmurtha commented 5 years ago

Our Nanopore rep told us to use the concat_summary.py script. Here's what I got for the NanoPlot --summary sequencing_summary*.txt script:

grid@GXB01209:/data/basecalled/PT-16-16-7/GA10000$ NanoPlot --summary sequencing_summary*.txt

If you read this then NanoPlot has crashed :-( Please try updating NanoPlot and see if that helps...

If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues If you could include the log file that would be really helpful. Thanks!

Traceback (most recent call last): File "/home/grid/miniconda3/bin/NanoPlot", line 11, in sys.exit(main()) File "/home/grid/miniconda3/lib/python3.5/site-packages/nanoplot/NanoPlot.py", line 63, in main barcoded=args.barcoded) File "/home/grid/miniconda3/lib/python3.5/site-packages/nanoget/nanoget.py", line 78, in get_input datadf.drop("readIDs", inplace=True) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 3697, in drop errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 3111, in drop obj = obj._drop_axis(labels, axis, level=level, errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 3143, in _drop_axis new_axis = axis.drop(labels, errors=errors) File "/home/grid/miniconda3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 4404, in drop '{} not found in axis'.format(labels[mask])) KeyError: "['readIDs'] not found in axis"

wdecoster commented 5 years ago

Would it be possible to share these summary files with me, for example via email or dropbox?

tmmurtha commented 5 years ago

awk 'FNR>1 || NR==1' sequencing_summary*.txt > concatenated_summary.txt Worked!!!

tmmurtha commented 5 years ago

Would you still like to see those summary files?

wdecoster commented 5 years ago

That's good to hear, but it still makes me wonder why the other one didn't. So if you could share these, yes please.

tmmurtha commented 5 years ago

I can compress and send them via email right now. I appreciate all of your patience and help!

wdecoster commented 5 years ago

Issue identified, thanks for the example! I'll upload new versions to bioconda and PyPI.