nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
192 stars 102 forks source link

Pipeline fails with mag_depth script error when bins are empty #630

Closed felipemachado85 closed 3 days ago

felipemachado85 commented 1 week ago

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-lsetp6nu because the default path (/users/f/s/fsantann/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.

Hi all!

First of all, thank you for this amazing tool, it's really refreshing to have such powerful pipeline at hands.

I just want to share this "bug" (quoted because it's entirely related to my sample), that I've encountered yesterday. I was running the pipeline with three samples, and I got this error message:

Caused by:
  Process `NFCORE_MAG:MAG:DEPTHS:MAG_DEPTHS_PLOT (MEGAHIT-MaxBin2-MAS)` terminated with an error exit status (1)

Command executed:

  plot_mag_depths.py --bin_depths MEGAHIT-MaxBin2-MAS-binDepths.tsv                     --groups sample_groups.tsv                     --out "MEGAHIT-MaxBin2-MAS-binDepths.heatmap.png"

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:DEPTHS:MAG_DEPTHS_PLOT":
      python: $(python --version 2>&1 | sed 's/Python //g')
      pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
      seaborn: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('seaborn').version)")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Matplotlib created a temporary config/cache directory at /tmp/matplotlib-lsetp6nu because the default path (/users/f/s/fsantann/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  Traceback (most recent call last):
    File "/users/f/s/fsantann/.nextflow/assets/nf-core/mag/bin/plot_mag_depths.py", line 83, in <module>
      sys.exit(main())
    File "/users/f/s/fsantann/.nextflow/assets/nf-core/mag/bin/plot_mag_depths.py", line 70, in main
      sns.clustermap(
    File "/usr/local/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
      return f(**kwargs)
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1402, in clustermap
      return plotter.plot(metric=metric, method=method,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1220, in plot
      self.plot_dendrograms(row_cluster, col_cluster, metric, method,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1065, in plot_dendrograms
      self.dendrogram_row = dendrogram(
    File "/usr/local/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
      return f(**kwargs)
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 784, in dendrogram
      plotter = _DendrogramPlotter(data, linkage=linkage, axis=axis,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 594, in __init__
      self.linkage = self.calculated_linkage
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 661, in calculated_linkage
      return self._calculate_linkage_scipy()
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 629, in _calculate_linkage_scipy
      linkage = hierarchy.linkage(self.array, method=self.method,
    File "/usr/local/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1068, in linkage
      n = int(distance.num_obs_y(y))
    File "/usr/local/lib/python3.9/site-packages/scipy/spatial/distance.py", line 2572, in num_obs_y
      raise ValueError("The number of observations cannot be determined on "
  ValueError: The number of observations cannot be determined on an empty distance matrix.

Work dir:
  /gpfs1/home/f/s/fsantann/work/64/858b740fe77cd5e95503ae127a3a77

Upon closer look, I checked the QC on these samples and found out that only one of them had actual bins (bin_summary.tsv). I re-ran the pipeline with the binned sample and it worked fine. I am unsure whether or not the missing bins might've affected the pipeline. I attached the nextflow.log for closer inspection.

Thanks!

Best,

Felipe

Command used and terminal output

No response

Relevant files

06_24_nextflow.log

System information

Nextflow version: 24.04.2 HPC slurm Singularity Linux nf-core/am version 3.0.1

jfy133 commented 1 week ago

I think this is more to do with how we handle samples when they don't result in any bins at all.

My feeling is we need to add a filter to remove such samples, plus a warning that it has happened.

maxibor commented 3 days ago

Fixed with #635