wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
413 stars 47 forks source link

Failt to execute newer version of NanoPlot (v NanoPlot 1.32.1) #252

Closed unique379r closed 3 years ago

unique379r commented 3 years ago

Hi I installed latest version expecting 1.35.5, but conda installed 1.32.1 and i tried to run but got an error. Can you please take a look that i got an error.

NanoPlot -t 1 --fastq ../*fail*.fastq --plots hex dot --title FailedReads --N50 --tsv_stats -o NanoPlotQC2 -p failFastqNanoPlot

Apps/envs/NanoPlots/lib/python3.7/site-packages/seaborn/distributions.py:2551: FutureWarning: distplot is a deprecated function and will be removed in a future version. Please adapt your code to use either displot (a figure-level function with similar flexibility) or histplot (an axes-level function for histograms). warnings.warn(msg, FutureWarning)

If you read this then NanoPlot 1.32.1 has crashed :-( Please try updating NanoPlot and see if that helps...

If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues If you could include the log file that would be really helpful. Thanks!

Traceback (most recent call last): File "Apps/envs/NanoPlots/bin/NanoPlot", line 10, in sys.exit(main()) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/nanoplot/NanoPlot.py", line 101, in main plots = make_plots(datadf, settings) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/nanoplot/NanoPlot.py", line 169, in make_plots plot_settings=plot_settings) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/nanoplotter/nanoplotter_main.py", line 135, in scatter height=10) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/seaborn/_decorators.py", line 46, in inner_f return f(kwargs) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/seaborn/axisgrid.py", line 2124, in jointplot grid.plot_joint(plt.hexbin, joint_kws) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/seaborn/axisgrid.py", line 1694, in plot_joint func(self.x, self.y, kwargs) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/matplotlib/pyplot.py", line 2593, in hexbin is not None else {}), kwargs) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/matplotlib/init.py", line 1565, in inner return func(ax, *map(sanitize_sequence, args), kwargs) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 4802, in hexbin collection.update(kwargs) File "Apps/envs/NanoPlots/lib/python3.7/site-packages/matplotlib/artist.py", line 1006, in update ret = [_update_property(self, k, v) for k, v in props.items()] File "Apps/envs/NanoPlots/lib/python3.7/site-packages/matplotlib/artist.py", line 1006, in ret = [_update_property(self, k, v) for k, v in props.items()] File "Apps/envs/NanoPlots/lib/python3.7/site-packages/matplotlib/artist.py", line 1002, in _update_property .format(type(self).name, k)) AttributeError: 'PolyCollection' object has no property 'stat_func'**

tjhinet commented 3 years ago

Hi, We also installed the newest version but the version installed is 1.32.1. This is the first time we are running NanoPlot and any help is really appreciated! Thanks in advance. This is the log that we got:

This is the NanoPlot version I loaded:

NanoPlot 1.32.1

NanoPlot --summary sequencing_summary_FAH29897_2fd1c19e.txt --verbose -o summary-plots

2021-04-12 22:02:56,426 NanoPlot 1.32.1 started with arguments Namespace(threads=4, verbose=True, store=False, raw=False, huge=False, outdir='summary-plots', prefix='', tsv_stats=False, maxlength=None, minlength=None, drop_outliers=False, downsample=None, loglength=False, percentqual=False, alength=False, minqual=None, runtime_until=None, readtype='1D', barcoded=False, no_supplementary=False, color='#4CB391', colormap='Greens', format='png', plots=['kde', 'dot'], listcolors=False, listcolormaps=False, no_N50=False, N50=False, title=None, font_scale=1, dpi=100, hide_stats=False, fastq=None, fasta=None, fastq_rich=None, fastq_minimal=None, summary=['sequencing_summary_FAH29897_2fd1c19e.txt'], bam=None, ubam=None, cram=None, pickle=None, feather=None, path='summary-plots/') 2021-04-12 22:02:56,426 Python version is: 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46) [GCC 9.3.0] 2021-04-12 22:02:56,428 NanoPlot: valid output format png 2021-04-12 22:02:56,493 Nanoget: Collecting metrics from summary file sequencing_summary_FAH29897_2fd1c19e.txt for 1D sequencing 2021-04-12 22:02:57,147 Nanoget: Finished collecting statistics from summary file sequencing_summary_FAH29897_2fd1c19e.txt 2021-04-12 22:02:57,349 Reduced DataFrame memory usage from 9.696212768554688Mb to 5.252115249633789Mb 2021-04-12 22:02:57,452 Nanoget: Gathered all metrics of 211817 reads 2021-04-12 22:02:57,525 index 0 is out of bounds for axis 0 with size 0 Traceback (most recent call last): File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanoplot/NanoPlot.py", line 77, in main settings["statsfile"] = [make_stats(datadf, settings, suffix="", tsv_stats=args.tsv_stats)] File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanoplot/NanoPlot.py", line 116, in make_stats stats_df = nanomath.write_stats( File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanomath/nanomath.py", line 177, in write_stats stats = [Stats(df) for df in datadfs] File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanomath/nanomath.py", line 177, in stats = [Stats(df) for df in datadfs] File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanomath/nanomath.py", line 39, in init self.n50 = get_N50(np.sort(df["lengths"])) File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanomath/nanomath.py", line 118, in get_N50 return readlengths[np.where(np.cumsum(readlengths) >= 0.5 * np.sum(readlengths))[0][0]] IndexError: index 0 is out of bounds for axis 0 with size 0

If you read this then NanoPlot 1.32.1 has crashed :-( Please try updating NanoPlot and see if that helps...

If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues If you could include the log file that would be really helpful. Thanks!

Traceback (most recent call last): File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/bin/NanoPlot", line 10, in sys.exit(main()) File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanoplot/NanoPlot.py", line 77, in main settings["statsfile"] = [make_stats(datadf, settings, suffix="", tsv_stats=args.tsv_stats)] File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanoplot/NanoPlot.py", line 116, in make_stats stats_df = nanomath.write_stats( File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanomath/nanomath.py", line 177, in write_stats stats = [Stats(df) for df in datadfs] File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanomath/nanomath.py", line 177, in stats = [Stats(df) for df in datadfs] File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanomath/nanomath.py", line 39, in init self.n50 = get_N50(np.sort(df["lengths"])) File "/sysapps/cluster/software/Anaconda3/2019.03/envs/nanoplot/lib/python3.9/site-packages/nanomath/nanomath.py", line 118, in get_N50 return readlengths[np.where(np.cumsum(readlengths) >= 0.5 * np.sum(readlengths))[0][0]] IndexError: index 0 is out of bounds for axis 0 with size 0

iliasbukraa commented 3 years ago

Hi @unique379r, could you perhaps provide me with the running version of NanoPlot (NanoPlot --version)? Perhaps a clean install of NanoPlot could fix it (conda remove -n <env> NanoPlot to remove NanoPlot, conda install -c bioconda nanoplot to reinstall).

@tjhinet, what output do you get when you execute following command: tail -10 sequencing_summary_FAH29897_2fd1c19e.txt because we might have a corrupt summary file at our hands.

PS: I suggest next time to open a issue of your own, that way the developers can help you out better :+1:

tjhinet commented 3 years ago

Thanks for your reply @iliasbukraa! Sorry about submitting my issue here. I thought that the version difference is the source of my problem too. I will definitely open a separate issue next time. Attached below is what I get when I executed tail -10 on the sequencing_summary file:

FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 1aba1e9a-7413-4dca-ab07-edbafa53eded 2fd1c19eff106536e96cf528617bd7242685b1e4 410 93032.996000 7.081250 7081 TRUE 93032.996000 7081 7.081253462 8.796349 0.000000 116.622147 10.325038 not_setDHD6Mar0921 DHD6 signal_positive FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 93c2dcf3-16e9-439e-b38a-cf1d9bad8bf0 2fd1c19eff106536e96cf528617bd7242685b1e4 410 93021.286250 11.408000 11408 TRUE 93021.499250 11195 11.195000 5271 8.314628 0.000000 117.976250 10.155775 not_set DHD6Mar0921 DHD6 signal_positive FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 354394f6-36c5-4d7f-b1aa-24102f3799e0 2fd1c19eff106536e96cf528617bd7242685b1e4 406 92998.911000 42.215250 42215 TRUE 92999.235000 41891 41.891250 16375 8.693711 0.000000 81.753990 8.293883 not_set DHD6Mar0921 DHD6 signal_positive FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 5f4e2c4d-3792-47bf-84c7-9d2f3cb82ba4 2fd1c19eff106536e96cf528617bd7242685b1e4 405 93013.774500 21.221500 21221 TRUE 93013.826500 21169 21.169500 9072 8.633043 0.000000 110.528687 10.663564 not_set DHD6Mar0921 DHD6 signal_positive FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 9d3a5bef-dd34-4565-9f14-bd3dd5a2c821 2fd1c19eff106536e96cf528617bd7242685b1e4 404 92980.426750 52.948750 52948 TRUE 92980.587750 52787 52.787750 21022 9.761472 0.000000 85.477776 9.478724 not_set DHD6Mar0921 DHD6 signal_positive FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 df75bfb9-7e95-46d2-8c65-bd6ad5be1297 2fd1c19eff106536e96cf528617bd7242685b1e4 396 93021.970750 10.220500 10220 TRUE 93022.019750 10171 10.171500 3548 8.788022 0.000000 100.880699 10.494301 not_set DHD6Mar0921 DHD6 signal_positive FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 d1d89c72-858e-4226-a92f-4bf36fd91743 2fd1c19eff106536e96cf528617bd7242685b1e4 393 93021.235000 10.486500 10486 TRUE 93021.301000 10420 10.420500 3743 8.072210 0.000000 96.649124 8.970935 not_set DHD6Mar0921 DHD6 signal_positive FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 2a4937ab-6e1a-4ffe-bc63-ec87d3c4dab1 2fd1c19eff106536e96cf528617bd7242685b1e4 388 93006.420000 26.235000 26235 TRUE 93006.587000 26068 26.068000 11545 8.509368 0.000000 99.357330 10.325038 not_set DHD6Mar0921 DHD6 signal_positive FAH29897_pass_2fd1c19e_25.fastq FAH29897_pass_2fd1c19e_25.fast5 0966f5dd-7d60-4fb3-bee2-9cac4059ac93 2fd1c19eff106536e96cf528617bd7242685b1e4 383 93017.726000 20.731250 20731 TRUE 93017.939000 20518 20.518250 5419 9.061234 0.000000 78.876518 8.632409 not_set DHD6Mar0921 DHD6 signal_positive

wdecoster commented 3 years ago

Do you think you can share that summary file with us @tjhinet?

tjhinet commented 3 years ago

Just to give a bit of context, after stopping my minion sequencing, my catch-up basecalling was interrupted. When I contacted the tech support at ONT, they directed me to the sequencing_summary.txt.tmp file of the run and suggested that I convert this to a .txt to use with NanoPlot to generate a qc report. Is there a specific way to do the conversion? I basically just changed the extension manually.

wdecoster commented 3 years ago

Renaming the file/changing the extension should be fine, in fact, the filename doesn't matter at all.

It is not unlikely that the interrupted basecalling led to the corruption of some of the lines in de summary file. Before sharing the full file, could you post the header (head -n1) to make sure we're looking at the right file?

tjhinet commented 3 years ago

@wdecoster Sure.

filename_fastq filename_fast5 read_id run_id channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_template median_template mad_template pore_type experiment_id sample_id end_reason

tjhinet commented 3 years ago

Also, dumb question. I tried to zip the sequencing_summary.txt file but it is still larger than 10Mb. Is there another way to share it than drag and drop?

iliasbukraa commented 3 years ago

You can click on the bar that says Attach files by dragging... to browse. I did notice you are missing data for one column, don't know if that is an issue @wdecoster?

tjhinet commented 3 years ago

sequencing_summary.zip

iliasbukraa commented 3 years ago

@tjhinet so the summary file is corrupt, which resulted in NAN's in your summary file leading to the NanoPlot error. I can send you the summary files without the NAN values, which runs just fine. sequencing_summary_FAH29897_2fd1c19e_no_nan.zip

tjhinet commented 3 years ago

I see. Thanks a lot for the assist @iliasbukraa! Just for my learning sake, are the NANs caused by the interruption to my catch-up basecalling?

iliasbukraa commented 3 years ago

@tjhinet I can't give you a conclusive answer on that (maybe @wdecoster can) but it doesn't seem a good idea to interrupt while basecalling.

unique379r commented 3 years ago

@iliasbukraa @wdecoster Hi Indeed this version i installed is fresh and totally new python environment. Before, it was in NanoPack with version of 1.30.1. I was intend to use newer version as wanted to have TSV output which you have implemented in new version only. By the way, I was testing fastq files which are working with previous version.

iliasbukraa commented 3 years ago

Sounds pretty standard use of NanoPlot, what version are you running in your python env (NanoPlot --version should give you that info)

unique379r commented 3 years ago

@iliasbukraa As i previously wrote for NanoPack version is 1.30.1 which is working fine. New version of NanoPlot is 1.32.1 thats not working as i reported above. Process i did to make env and installed:

conda create -n NanoPlot python=3.7.7
## from anaconda cloud
conda activate NanoPlot
conda install -c bioconda nanoplot
conda deactivate
iliasbukraa commented 3 years ago

Yes of course I understood, 1.30.1 works for you. However you create a new environment and install NanoPlot, but the latest version is 1.35.5 and not 1.32.1, which is why I am curious as to what version you have installed in the environment.

unique379r commented 3 years ago

@iliasbukraa I tried to update the package with version but it is just stuck:

conda install -c bioconda nanoplot=1.35.5
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
tjhinet commented 3 years ago

@iliasbukraa Of course. The interruption happened when the machine on which the sequencing was performed decided to hibernate. This is also the reason why I did not get the chance to export the pdf of the qc reports. If you don't mind, can I ask how you removed all of the NANs? Sorry if this is a basic question. Thanks.

iliasbukraa commented 3 years ago

@iliasbukraa I tried to update the package with version but it is just stuck:

conda install -c bioconda nanoplot=1.35.5
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.

I suggest you completely remove the NanoPlot environment you created before, create a new environment with a Nanoplot installation and let me know what the Nanoplot version is (should be the latest, 1.35.5, but it seems in your case it's not).

iliasbukraa commented 3 years ago

@iliasbukraa Of course. The interruption happened when the machine on which the sequencing was performed decided to hibernate. This is also the reason why I did not get the chance to export the pdf of the qc reports. If you don't mind, can I ask how you removed all of the NANs? Sorry if this is a basic question. Thanks.

I loaded your sequencing summary into a pandas dataframe, removed all NA's (dropna() function) and exported the NA-less dataframe to csv.

unique379r commented 3 years ago

@iliasbukraa @wdecoster Hi again, I deleted my env of NanoPlot and created again to install it as per your instruction but its still same version installed by default i.e. NanoPlot 1.32.1. I used: conda install -c bioconda nanoplot However, i forced to install by conda install -c bioconda nanoplot=1.35.1 but was stuck as before "This can take several minutes. Press CTRL-C to abort." Can you please check if is tagging misprint in anaconda cloud version ? thanks for the help.

wdecoster commented 3 years ago

Could you please try: conda install mamba And then use mamba instead of conda to install NanoPlot

unique379r commented 3 years ago

Hi @wdecoster I did, but issue is the same.. Installed 1.32.1 with momba as well.

conda create -n NanoPlot python=3.7.7
conda activate NanoPlot
conda install mamba
mamba install -c bioconda nanoplot
/Apps/envs/NanoPlot/bin/NanoPlot --version
NanoPlot 1.32.1
iliasbukraa commented 3 years ago

if you're still in your original env, maybe it would help to uninstall NanoPlot first (since it is installed with conda) and then try the installation with mamba.

wdecoster commented 3 years ago

To make the 1.32.1 version work (as a temporary workaround until we figure out what could be wrong with your conda installation) you can downgrade seaborn with conda install seaborn==0.10.1 (see also https://github.com/wdecoster/NanoPlot/issues/222)

unique379r commented 3 years ago

pip worked for me, thanks

aichachikh1202 commented 10 months ago

bonjour comment exécuter la commande Nanoplot