schmeing / gapless

Gapless provides combined scaffolding, gap-closing and assembly correction with long reads
MIT License
33 stars 4 forks source link

pipeline crashed : scaffold (Scaffold graph is inconsistent) #5

Open Duda5 opened 1 year ago

Duda5 commented 1 year ago

Hi, I only managed to run gapless on my ONT assembly (~400 contigs, haploid genome size of 3.1Gbp, there are no gaps in contigs) up to gapless.py scaffold stage in the pipeline.

My command was gapless.sh -i asm.fa -o gapless_out -t nanopore -j 18 ONT_treads.fastq.gz

Here are the contents of gapless_scaffold.log, where multiple errors are being logged:

/home/duda5/anaconda3/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/home/duda5/anaconda3/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/home/duda5/soft/gapless/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
/home/duda5/anaconda3/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/home/duda5/soft/gapless/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
/home/duda5/anaconda3/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/home/duda5/soft/gapless/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
/home/duda5/anaconda3/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py:315: RuntimeWarning: divide by zero encountered in _nbinom_cdf
  return _boost._nbinom_cdf(k, n, p)
0:00:02.213879 Reading in original assembly
0:00:04.935302 Loading repeats
0:00:05.341227 Filtering mappings
0:01:44.997825 Search for possible break points
0:08:56.882098 Search for possible bridges
0:09:24.659574 Scaffold the contigs
      sindex    from from_side  scaf1  ... dist17  scaf18 strand18  dist18
0      24199    30.0         l   27.0  ...    NaN     NaN      NaN     NaN
1      24200    27.0         r   29.0  ...    NaN     NaN      NaN     NaN
2      24202    27.0         l   30.0  ...    NaN     NaN      NaN     NaN
3      24203    29.0         r   30.0  ...    NaN     NaN      NaN     NaN
4      24204    43.0         r   41.0  ...    NaN     NaN      NaN     NaN
...      ...     ...       ...    ...  ...    ...     ...      ...     ...
3137   23371   899.0         l  900.0  ...    0.0     NaN      NaN     NaN
3138   21981   901.0         r  902.0  ...    0.0     NaN      NaN     NaN
3139   23785   899.0         l  900.0  ...    0.0     NaN      NaN     NaN
3140   15821  1113.0         l  906.0  ...    0.0   899.0        -     0.0
3141   19043   906.0         r  905.0  ...    0.0   904.0        -     0.0

[3142 rows x 58 columns]
Traceback (most recent call last):
  File "/home/duda5/soft/gapless/gapless.py", line 13327, in <module>
    main(sys.argv[1:])
  File "/home/duda5/soft/gapless/gapless.py", line 13156, in main
    GaplessScaffold(args[0], args[1], args[2], min_mapq, min_mapping_length, min_length_contig_break, prefix, stats)
  File "/home/duda5/soft/gapless/gapless.py", line 9101, in GaplessScaffold
    scaffold_paths, trim_repeats = ScaffoldContigs(contig_parts, bridges, mappings, cov_probs, repeats, prob_factor, min_mapping_length, max_dist_contig_end, prematurity_threshold, ploidy, max_loop_units)
  File "/home/duda5/soft/gapless/gapless.py", line 7840, in ScaffoldContigs
    scaffold_graph = BuildScaffoldGraph(long_range_connections, scaf_bridges)
  File "/home/duda5/soft/gapless/gapless.py", line 2422, in BuildScaffoldGraph
    CheckScaffoldGraphConsistency(scaffold_graph)
  File "/home/duda5/soft/gapless/gapless.py", line 2358, in CheckScaffoldGraphConsistency
    raise RuntimeError("Scaffold graph is inconsistent: Not all reverse entries are present.")
RuntimeError: Scaffold graph is inconsistent: Not all reverse entries are present.
schmeing commented 1 year ago

Thank you for reporting this.

The divisions by zero worry me and that the graph is inconsistent should never happen, so this is clearly a bug. To fix this we have two options. The fast one would be to provide me with the ´gapless_split.fa´, ´gapless_reads.paf´, ´gapless_split_repeats.paf´ files (or download links) at stephan.schmeing@uzh.ch and I can trace and fix the issue myself. The slow option is that I navigate you through the code so you can find, where and why the issue occurs to allow me to fix it.

Best, Stephan