wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
412 stars 48 forks source link

Limit on plotting long sequences? #378

Closed desmodus1984 closed 1 week ago

desmodus1984 commented 1 month ago

Hi,

I tried installing NanoPlot with conda but it didn't finish, so I installed it as a regular app. I wanted to make the quality-score vs sequence length, and it wasn't clear which options to use so I followed one of the examples, and I used the following code: NanoPlot -t 8 --fastq Ju760.Lig.P-1.fastq --plots hex dot --maxlength 250000 WARNING: hex as part of --plots has been deprecated and will be ignored. To get the hex output, rerun with --legacy hex.

Then, I know that I have a sequence above 200kb, seqkit stat Ju760.Lig.P-1.fastq file format type num_seqs sum_len min_len avg_len max_len Ju760.Lig.P-1.fastq FASTQ DNA 438,478 2,242,313,164 4 5,113.9 240,959

and I was trying to find it because I wanted to check the quality score of it, and I couldn't find it in the plot, even after zooming in, check it within the range of the length.

image

Could you tell why this sequence is not visible or shown in the plot?

Thanks;

wdecoster commented 1 month ago

The plotting function randomly samples your data, to maximally show 10,000 dots in the plot. More dots makes things slower, and makes HTML images larger. It seems you have more reads than 10,000 and your longest molecule was removed. The idea is to show the overall distribution of the dataset, which would be reflected with 10k reads, but outliers could be lost.

desmodus1984 commented 3 weeks ago

Hi, I got a fastq.gz file from a friend, and I tried doing QC and Nanoplot crashed. I installed it fresh to a new environment. I used this code NanoPlot -t 12 --fastq JustinDMV002.fastq.gz --plots hex dot --maxlength 100000 -p JustinDMV002

Error message:

/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/nanoplotter/nanoplotter_main.py:283: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  alpha=0.8))
/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/nanoplotter/nanoplotter_main.py:308: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  alpha=0.8))

If you read this then NanoPlot 1.30.1 has crashed :-(
Please try updating NanoPlot and see if that helps...

If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues
If you could include the log file that would be really helpful.
Thanks!

Traceback (most recent call last):
  File "/home/juaguila/miniconda3/envs/pomoxis/bin/NanoPlot", line 10, in <module>
    sys.exit(main())
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/nanoplot/NanoPlot.py", line 97, in main
    plots = make_plots(datadf, settings)
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/nanoplot/NanoPlot.py", line 163, in make_plots
    plot_settings=plot_settings)
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/nanoplotter/nanoplotter_main.py", line 135, in scatter
    height=10)
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/seaborn/axisgrid.py", line 2311, in jointplot
    grid.plot_joint(plt.hexbin, **joint_kws)
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/seaborn/axisgrid.py", line 1828, in plot_joint
    func(self.x, self.y, **kwargs)
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/matplotlib/pyplot.py", line 2593, in hexbin
    is not None else {}), **kwargs)
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/matplotlib/__init__.py", line 1565, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 4802, in hexbin
    collection.update(kwargs)
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/matplotlib/artist.py", line 1006, in update
    ret = [_update_property(self, k, v) for k, v in props.items()]
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/matplotlib/artist.py", line 1006, in <listcomp>
    ret = [_update_property(self, k, v) for k, v in props.items()]
  File "/home/juaguila/miniconda3/envs/pomoxis/lib/python3.7/site-packages/matplotlib/artist.py", line 1002, in _update_property
    .format(type(self).__name__, k))
AttributeError: 'PolyCollection' object has no property 'stat_func'

I don't understand the point of doing the update, when I just installed it today. I know the file is big, but I thought that it only uses 10.000, it is not using the entire dataset (14 GB of 16kb reads).

Any reason why it did fail, and potentially how to fix this?

Thank you;

wdecoster commented 2 weeks ago

Hi,

I don't understand the point of doing the update, when I just installed it today.

According to the log, you are using NanoPlot 1.30.1, which is not the latest version..

I know the file is big, but I thought that it only uses 10.000, it is not using the entire dataset (14 GB of 16kb reads).

That size should be totally fine.

Any reason why it did fail, and potentially how to fix this?

Duplicate of https://github.com/wdecoster/NanoPlot/issues/347

Best, Wouter