schmeing / gapless

Gapless provides combined scaffolding, gap-closing and assembly correction with long reads
MIT License
32 stars 4 forks source link

pipeline crashed : scaffold #1

Closed lucabianco78 closed 2 years ago

lucabianco78 commented 2 years ago

Hi,

I am trying to use gapless on a genome with ONT data. Unfortunately, I get the error below. When it crashes I get this message to std out: "pipeline crashed : scaffold"

Can you please give any advice? Thanks

Traceback (most recent call last): File "/usr/local/bin/gapless//gapless.py", line 13263, in main(sys.argv[1:]) File "/usr/local/bin/gapless//gapless.py", line 13092, in main GaplessScaffold(args[0], args[1], args[2], min_mapq, min_mapping_length, min_length_contig_break, prefix, stats) File "/usr/local/bin/gapless//gapless.py", line 9039, in GaplessScaffold scaffold_paths, trim_repeats = ScaffoldContigs(contig_parts, bridges, mappings, cov_probs, repeats, prob_factor, min_mapping_length, max_dist_contig_end, prematurity_threshold, ploidy, max_loop_units) File "/usr/local/bin/gapless//gapless.py", line 7783, in ScaffoldContigs scaffold_paths = TraverseScaffoldGraph(scaffolds, scaffold_graph, graph_ext, scaf_bridges, org_scaf_conns, ploidy, max_loop_units) File "/usr/local/bin/gapless//gapless.py", line 7314, in TraverseScaffoldGraph CheckIfScaffoldPathsFollowsValidBridges(scaffold_paths, scaf_bridges, ploidy) File "/usr/local/bin/gapless//gapless.py", line 4746, in CheckIfScaffoldPathsFollowsValidBridges raise RuntimeError("Scaffold path contains invalid bridges.") RuntimeError: Scaffold path contains invalid bridges.

schmeing commented 2 years ago

Hi, Thanks for letting me know. More information on what happened will be in the file passX/logs/gapless_scaffold.log within your selected output folder. The X is the current pass number, where it crashed. Thus, the highest number is the interesting one there. I will need to debug this and fix the code. The simplest way would be if you can provide me the exact command and all the input files at stephan.schmeing@uzh.ch . Otherwise, we likely need several iterations of me telling you where to change the code and you telling me the results. Best, Stephan

aabaricalla commented 2 years ago

Hi there! I was testing Gapless with my data and I have the same problem with PB CLR data. Any suggestion would be great. I'll be waiting for any solution or update.

gapless_scaffold.log

schmeing commented 2 years ago

The issue was based on a change of behaviour between pandas version 1.3.1 and 1.4.2. I assume it was unintentionally, but I still have to create a minimal working example and check with the pandas team. Independent of that I added a workaround and pushed it to github. Please verify if your data works after the new gapless commit.

Thanks, for bringing this to my attention.

Mjaraespejo commented 2 years ago

Hello,

I am also having issues during scaffolding. I am using PacBio HiFi data. I am attaching the gapless_scaffold.log file. Hope you can help me.

gapless_scaffold_mje.log

Thanks, Manuel

schmeing commented 2 years ago

Hello Manuel,

thank you for reporting this bug. In all my own tests I never managed to create this inconsistent state. It could be caused nearly anywhere in the pipeline. Thus, unfortunately without your data there is little I can do. I assume you ran the gapless.sh. In that case, if you can share the gapless_split.fa, gapless_reads.paf and gapless_split_repeats.paf with me at stephan.schmeing@uzh.ch I will go through the scaffolding and see what is causing it.

Thanks, Stephan

splaisan commented 2 years ago

Hi,

I have ran gapless few days ago and today I run into this same error (I ran git pool without improvement so I am uptodate). A bioconda dependency list would be a nice thing to be able to reinstall the right tools and get a functional env. thanks in advance for your help

my gapless env panda is v1.3.1

cat gapless_run/pass1/logs/gapless_scaffold.log 
0:00:07.512620 Reading in original assembly
0:00:07.540919 Loading repeats
0:00:07.545660 Filtering mappings
Traceback (most recent call last):
  File "/opt/biotools/bin/gapless.py", line 13325, in <module>
    main(sys.argv[1:])
  File "/opt/biotools/bin/gapless.py", line 13154, in main
    GaplessScaffold(args[0], args[1], args[2], min_mapq, min_mapping_length, min_length_contig_break, prefix, stats)
  File "/opt/biotools/bin/gapless.py", line 9068, in GaplessScaffold
    mappings, cov_counts, cov_probs, read_names, read_len = ReadMappings(mapping_file, contig_ids, min_mapq, min_mapping_length, keep_all_subreads, alignment_precision, num_read_len_groups, pdf)
  File "/opt/biotools/bin/gapless.py", line 415, in ReadMappings
    mappings = ReadPaf(mapping_file)
  File "/opt/biotools/bin/gapless.py", line 217, in ReadPaf
    return pd.read_csv(file_name, sep='\t', header=None, usecols=range(12), names=['q_name','q_len','q_start','q_end','strand','t_name','t_len','t_start','t_end','matches','alignment_length','mapq'], dtype={'q_name':object, 'q_len':np.int32, 'q_start':np.int32, 'q_end':np.int32, 'strand':str, 't_name':object, 't_len':np.int32, 't_start':np.int32, 't_end':np.int32, 'matches':np.int32, 'alignment_length':np.int32, 'mapq':np.int16})
  File "/opt/miniconda3/envs/gapless/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/opt/miniconda3/envs/gapless/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/opt/miniconda3/envs/gapless/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/opt/miniconda3/envs/gapless/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/opt/miniconda3/envs/gapless/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/opt/miniconda3/envs/gapless/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in __init__
    self._open_handles(src, kwds)
  File "/opt/miniconda3/envs/gapless/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 222, in _open_handles
    self.handles = get_handle(
  File "/opt/miniconda3/envs/gapless/lib/python3.9/site-packages/pandas/io/common.py", line 701, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'pass1/gapless_reads.paf'
# packages in environment at /opt/miniconda3/envs/gapless:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
biopython                 1.77             py39h27cfd23_0  
blas                      1.0                    openblas  
bottleneck                1.3.4            py39hce1f21e_0  
brotli                    1.0.9                he6710b0_2  
ca-certificates           2022.4.26            h06a4308_0  
certifi                   2022.5.18.1      py39h06a4308_0  
cycler                    0.11.0             pyhd3eb1b0_0  
dbus                      1.13.18              hb2f20db_0  
expat                     2.4.4                h295c915_0  
fontconfig                2.13.1               h6c09931_0  
fonttools                 4.25.0             pyhd3eb1b0_0  
freetype                  2.11.0               h70c0345_0  
giflib                    5.2.1                h7b6447c_0  
glib                      2.69.1               h4ff587b_1  
gst-plugins-base          1.14.0               h8213a91_2  
gstreamer                 1.14.0               h28cd5cc_2  
icu                       58.2                 he6710b0_3  
jbig                      2.1                  hdba287a_0  
jpeg                      9e                   h7f8727e_0  
kiwisolver                1.4.2            py39h295c915_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      3.0                  h295c915_0  
libblas                   3.9.0           15_linux64_openblas    conda-forge
libcblas                  3.9.0           15_linux64_openblas    conda-forge
libdeflate                1.8                  h7f8727e_5  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libgfortran-ng            11.2.0               h00389a5_1  
libgfortran5              11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
liblapack                 3.9.0           15_linux64_openblas    conda-forge
libopenblas               0.3.20               h043d6bf_1  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtiff                   4.3.0                h6f004c6_2    conda-forge
libuuid                   1.0.3                h7f8727e_2  
libwebp                   1.2.2                h55f646e_0  
libwebp-base              1.2.2                h7f8727e_0  
libxcb                    1.15                 h7f8727e_0  
libxml2                   2.9.12               h03d6c58_0  
libzlib                   1.2.11            h166bdaf_1014    conda-forge
lz4-c                     1.9.3                h295c915_1  
matplotlib                3.4.2            py39h06a4308_0  
matplotlib-base           3.4.2            py39hab158f2_0  
munkres                   1.1.4                      py_0  
ncurses                   6.3                  h7f8727e_2  
numexpr                   2.8.1            py39hecfb737_0  
numpy                     1.22.3           py39h7a5d4dd_0  
numpy-base                1.22.3           py39hb8be1f0_0  
olefile                   0.46               pyhd3eb1b0_0  
openssl                   1.1.1o               h7f8727e_0  
packaging                 21.3               pyhd3eb1b0_0  
pandas                    1.3.1            py39h8c16a72_0  
pcre                      8.45                 h295c915_0  
pillow                    8.3.1            py39h5aabda8_0  
pip                       21.2.4           py39h06a4308_0  
pyparsing                 3.0.4              pyhd3eb1b0_0  
pyqt                      5.9.2            py39h2531618_6  
python                    3.9.6                h12debd9_1  
python-dateutil           2.8.2              pyhd3eb1b0_0  
python_abi                3.9                      2_cp39    conda-forge
pytz                      2021.3             pyhd3eb1b0_0  
qt                        5.9.7                h5867ecd_1  
readline                  8.1.2                h7f8727e_1  
scipy                     1.6.3            py39hee8e79c_0    conda-forge
seaborn                   0.11.1             pyhd3eb1b0_0  
seqtk                     1.3                  h7132678_4    bioconda
setuptools                61.2.0           py39h06a4308_0  
sip                       4.19.13          py39h295c915_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.38.2               hc218d9a_0  
tk                        8.6.11               h1ccaba5_0  
tornado                   6.1              py39h27cfd23_0  
tzdata                    2022a                hda174b7_0  
wheel                     0.37.1             pyhd3eb1b0_0  
xz                        5.2.5                h7f8727e_1  
zlib                      1.2.11            h166bdaf_1014    conda-forge
zstd                      1.5.0                ha4553b6_1 
splaisan commented 2 years ago

Hi again Stephan,

I re-discovered the initial cause of my crash, my fastq reads were compressed!

I fixed that before from some ticket but had forgotten since, if confirmed, could you please write this clearly in the Readme.md and maybe include the conda env I attached below (and maybe the link to your bioXiv manuscript).

Now, with my plain fastq reads the command crashed later in the process: pipeline crashed: finish

INSTALL

Note: I use conda 4.12.0 as the next version breaks some of my existing envs

RUN and die

gapless.sh \
  -i flye_pilon.fa \
  -j 24 \
  -n 3 \
  -o gl_out \
  -t pb_hifi \
   SRR18210286.fq

I attach the environment file for testing as well as the archive of my failed run

Thanks a lot for your tool!

Stephane

environment.yml.txt runfolder.zip

Note: I just sent you a Filesender link to the reads (8.6 GB) to your uzh email .

schmeing commented 2 years ago

Hi, regarding your first issue: Gzipped reads should work. However, the crash does not seem to come from gapless, but from minimap2. To see what the problems with the compressed reads are you need to look at: logs/minimap2_reads.log

Regarding the conda environment. I will create a bioconda package once the errors dripping in are fixed. Lately, I created a new conda environment for gapless with the following command: conda create -c conda-forge --name gapless python=3.10.2 pandas=1.4.2 numpy=1.22.3 scipy=1.8.0 seaborn matplotlib pillow biopython However, this does not include the external requirements of minimap2, racon and seqtk used in the bash script. The conda packages for those exist and can be added if people like: conda install -c bioconda minimap2 seqtk racon

The versions should also not be of great importance, so if you get errors with other (recent) versions (lowest I tried was python 3.6 and pandas 1.1.0) please let me know. Regarding the conda environment. I will create a bioconda package once the errors dripping in are fixed. Lately, I created a new conda environment for gapless with the following command: conda create -c conda-forge --name gapless python=3.10.2 pandas=1.4.2 numpy=1.22.3 scipy=1.8.0 seaborn matplotlib pillow biopython However, this does not include the external requirements of minimap2, racon and seqtk used in the bash script. The conda packages for those exist and can be added if people like: conda install -c bioconda minimap2 seqtk racon

The versions should also not be of great importance, so if you get errors with other (recent) versions (lowest I tried was python 3.6 and pandas 1.1.0) please let me know.

schmeing commented 2 years ago

An important comment: If you send something to my uzh email please do so as Stephane did and state the problem here and announce that you sent something. This email account has much more spam than relevant emails these days. Without his announcement here I would have missed the email! So if somebody has not received an answer by now, I simply have missed your email.

schmeing commented 2 years ago

Hi Stephane,

Your second issue is now resolved. I pushed the fixed to the github a minute ago. Thank you for providing all the data for a quick reproduction and fix of the bug.

splaisan commented 2 years ago

Hi Stefan,

Thanks for your feedback and new version. I pulled the current git and created a whole new env as detailed above. The env created without issues, in only noticed that when adding minimap2, three initial packages were updated because of bioconda. No idea if this will be relevant at a later stage, just to mention it.

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    conda-forge::ca-certificates-2022.6.1~ --> pkgs/main::ca-certificates-2022.4.26-h06a4308_0
  certifi            conda-forge::certifi-2022.6.15-py310h~ --> pkgs/main::certifi-2022.5.18.1-py310h06a4308_0
  openssl            conda-forge::openssl-1.1.1o-h166bdaf_0 --> pkgs/main::openssl-1.1.1o-h7f8727e_0

When running again my code, I immediately got the scaffolding error. I looked into the logs and it seems that "${org_path}/${reads}" at line 138 of your bash wrapper duplicated the read path (which are in a separate folder and given as full path to gapless.sh), leading to an early crash.

I copied the reads.gz locally and it now runs and finishes. (btw, this is why I thought gzip was a problem when I copied the decompressed reads locally before and it worked better)

Thanks for your help! Stephane :switzerland:

minimap2_reads.log

[M::mm_idx_gen::0.359*1.01] collected minimizers
[M::mm_idx_gen::0.411*4.79] sorted minimizers
[M::main::0.411*4.79] loaded/built the index for 32 target sequence(s)
[M::mm_mapopt_update::0.447*4.49] mid_occ = 50
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 0; #seq: 32
[M::mm_idx_stat::0.470*4.32] distinct minimizers: 2084403 (95.23% are singletons); average occurrences: 1.075; average spacing: 5.495; total length: 12317436
ERROR: failed to open file '/data/analyses/SRR18210286_Scerevisiae_HiFi/gapless_assemblies//data/analyses/SRR18210286_Scerevisiae_HiFi/reads/SRR18210286.fq.gz': No such file or directory
ERROR: failed to map the query file

my command was:

#!/bin/bash

source /etc/profile.d/conda.sh
thr=80
reads=/data/analyses/SRR18210286_Scerevisiae_HiFi/reads/SRR18210286.fq.gz
# reads=SRR18210286.fq.gz
cp ../pilon_assemblies/*_pilon.fa .

##############
# conda create -c conda-forge --name gapless python=3.10.2 pandas=1.4.2 \
#  numpy=1.22.3 scipy=1.8.0 seaborn matplotlib pillow biopython
# then within the new env:
# conda install -c bioconda minimap2 seqtk racon
myenv=gapless

conda activate ${myenv} || \
  ( echo "# the conda environment ${myenv} was not found on this machine" ;
    echo "# please read the top part of the script!" \
    && exit 1 )

for asm in flye_pilon.fa; do
#  put aside for debugging: hicanu_pilon.fa hifiasm_pilon.fa ipa_pilon.fa nd_pilon.fa; do

pfx=${asm%.fa}

echo "# gapless scaffolding for ${pfx}"

gapless.sh \
  -i ${asm} \
  -j ${thr} \
  -n 3 \
  -o gapless_${pfx} \
  -t pb_hifi \
  ${reads}

# copy final asm to local folder
cp gapless_${pfx}/gapless.fa ${pfx%_pilon}_gapless.fa

done

conda deactivate

exit 0
schmeing commented 2 years ago

Sorry for that. I did not support absolute paths. That is fixed now.

splaisan commented 2 years ago

Hi Stefan,

Today I have a weird issue, the pipeline which worked for two assemblies (made with IPA and flye on the same reads) failed now twice with the assembly from hicanu.

When I redo ipa after the failed hicanu, it works

Another assembly fails (hifiasm) with the same error.

Could it be that hicanu and hifiasm issue haplotigs (pairs of contigs) rather than consensus contigs and somehow this is bothering your pipeline?

I echo the scaffold log and attach the assembly input file, the reads are the same as before.

hicanu_pilon.fa.zip

# hicanu log
$ cat gapless_scaffold.log
/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/opt/biotools/bin/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/opt/biotools/bin/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/opt/biotools/bin/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
0:00:07.969616 Reading in original assembly
0:00:08.011044 Loading repeats
0:00:08.043497 Filtering mappings
0:00:18.317045 Search for possible break points
0:00:38.241233 Search for possible bridges
0:00:38.436451 Scaffold the contigs
Traceback (most recent call last):
  File "/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/biotools/bin/gapless.py", line 13326, in <module>
    main(sys.argv[1:])
  File "/opt/biotools/bin/gapless.py", line 13155, in main
    GaplessScaffold(args[0], args[1], args[2], min_mapq, min_mapping_length, min_length_contig_break, prefix, stats)
  File "/opt/biotools/bin/gapless.py", line 9100, in GaplessScaffold
    scaffold_paths, trim_repeats = ScaffoldContigs(contig_parts, bridges, mappings, cov_probs, repeats, prob_factor, min_mapping_length, max_dist_contig_end, prematurity_threshold, ploidy, max_loop_units)
  File "/opt/biotools/bin/gapless.py", line 7847, in ScaffoldContigs
    scaffold_paths = ExpandScaffoldsWithContigs(scaffold_paths, scaffolds, scaffold_parts, ploidy)
  File "/opt/biotools/bin/gapless.py", line 7697, in ExpandScaffoldsWithContigs
    scaffold_paths = scaffold_paths.loc[np.repeat(scaffold_paths.index.values, scaffold_paths[[f'size{h}' for h in range(ploidy)]].max(axis=1).values)]
  File "<__array_function__ internals>", line 180, in repeat
  File "/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 479, in repeat
    return _wrapfunc(a, 'repeat', repeats, axis=axis)
  File "/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 66, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

hifiasm_pilon.fa.zip

# hifiasm log
$ cat gapless_scaffold.log
/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/opt/biotools/bin/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/opt/biotools/bin/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
/opt/biotools/bin/gapless.py:253: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set(xticklabels=np.where(locs.astype(int) == locs, (10 ** locs).astype(str), ""))
0:00:07.740485 Reading in original assembly
0:00:07.776979 Loading repeats
0:00:07.814693 Filtering mappings
0:00:18.338776 Search for possible break points
0:00:38.278662 Search for possible bridges
0:00:40.703660 Scaffold the contigs
Traceback (most recent call last):
  File "/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/biotools/bin/gapless.py", line 13326, in <module>
    main(sys.argv[1:])
  File "/opt/biotools/bin/gapless.py", line 13155, in main
    GaplessScaffold(args[0], args[1], args[2], min_mapq, min_mapping_length, min_length_contig_break, prefix, stats)
  File "/opt/biotools/bin/gapless.py", line 9100, in GaplessScaffold
    scaffold_paths, trim_repeats = ScaffoldContigs(contig_parts, bridges, mappings, cov_probs, repeats, prob_factor, min_mapping_length, max_dist_contig_end, prematurity_threshold, ploidy, max_loop_units)
  File "/opt/biotools/bin/gapless.py", line 7847, in ScaffoldContigs
    scaffold_paths = ExpandScaffoldsWithContigs(scaffold_paths, scaffolds, scaffold_parts, ploidy)
  File "/opt/biotools/bin/gapless.py", line 7697, in ExpandScaffoldsWithContigs
    scaffold_paths = scaffold_paths.loc[np.repeat(scaffold_paths.index.values, scaffold_paths[[f'size{h}' for h in range(ploidy)]].max(axis=1).values)]
  File "<__array_function__ internals>", line 180, in repeat
  File "/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 479, in repeat
    return _wrapfunc(a, 'repeat', repeats, axis=axis)
  File "/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 66, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/opt/miniconda3/envs/gapless/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'
schmeing commented 2 years ago

I pushed the fix. Thanks for finding the bug.

splaisan commented 2 years ago

Stefan, Your last edits did the magic, I could now run the missing two gapless processes Thank you very much for your nice support and great tool

xhu556 commented 2 years ago

Hi,

I am testing gapless. When it crashes I get this message to std out: "pipeline crashed : scaffold"

Here is the log file $cat logs/gapless_scaffold.log 0:00:01.674690 Reading in original assembly 0:00:02.363788 Loading repeats 0:00:02.444654 Filtering mappings Traceback (most recent call last): File "/ppq/data1/software/gapless//gapless.py", line 13326, in main(sys.argv[1:]) File "/ppq/data1/software/gapless//gapless.py", line 13155, in main GaplessScaffold(args[0], args[1], args[2], min_mapq, min_mapping_length, min_length_contig_break, prefix, stats) File "/ppq/data1/software/gapless//gapless.py", line 9068, in GaplessScaffold mappings, cov_counts, cov_probs, read_names, read_len = ReadMappings(mapping_file, contig_ids, min_mapq, min_mapping_len gth, keep_all_subreads, alignment_precision, num_read_len_groups, pdf) File "/ppq/data1/software/gapless//gapless.py", line 432, in ReadMappings PlotHist(pdf, "Mapping quality", "# Mappings", mappings['mapq'], threshold=min_mapq, logy=True) File "/ppq/data1/software/gapless//gapless.py", line 262, in PlotHist ax.set_yscale('log', nonpositive='clip') File "/ppq/data1/software/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py", line 3531, in set_yscale ax.yaxis._set_scale(value, kwargs) File "/ppq/data1/software/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py", line 771, in _set_scale self._scale = mscale.scale_factory(value, self, kwargs) File "/ppq/data1/software/anaconda3/lib/python3.7/site-packages/matplotlib/scale.py", line 573, in scale_factory return _scale_mapping[scale](axis, **kwargs) File "/ppq/data1/software/anaconda3/lib/python3.7/site-packages/matplotlib/scale.py", line 253, in init "{!r}".format(kwargs)) ValueError: provided too many kwargs, can only pass {'basex', 'subsx', nonposx'} or {'basey', 'subsy', nonposy'}. You passe d {'nonpositive': 'clip'}

schmeing commented 2 years ago

Hi, thx for reporting it. What version of matplotlib are you using? It appears to have different options for setting axis to log. If it is newer than 3.4.2 I will update the code to get it to work. Otherwise, please update your matplotlib package.

splaisan commented 2 years ago

Hi, my matplotlib packages are v3.5.2 Thanks for your help

(gapless) u0002316@gbw-s-pacbio01:~ $ conda list
# packages in environment at /opt/miniconda3/envs/gapless:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
alsa-lib                  1.2.6.1              h7f98852_0    conda-forge
attr                      2.5.1                h166bdaf_0    conda-forge
biopython                 1.79            py310h6acc77f_1    conda-forge
brotli                    1.0.9                h166bdaf_7    conda-forge
brotli-bin                1.0.9                h166bdaf_7    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2022.4.26            h06a4308_0  
certifi                   2022.5.18.1     py310h06a4308_0  
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
expat                     2.4.8                h27087fc_0    conda-forge
fftw                      3.3.10          nompi_h77c792f_102    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.0               h8e229c2_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.33.3          py310h5764c6d_0    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
gst-plugins-base          1.20.2               hf6a322e_1    conda-forge
gstreamer                 1.20.2               hd4edc92_1    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
jack                      1.9.18            h8c3723f_1002    conda-forge
jpeg                      9e                   h166bdaf_1    conda-forge
k8                        0.2.5                hd03093a_2    bioconda
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.3           py310hbf28c38_0    conda-forge
krb5                      1.19.3               h3790be6_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libblas                   3.9.0           15_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
libbrotlidec              1.0.9                h166bdaf_7    conda-forge
libbrotlienc              1.0.9                h166bdaf_7    conda-forge
libcap                    2.64                 ha37c62d_0    conda-forge
libcblas                  3.9.0           15_linux64_openblas    conda-forge
libclang                  14.0.5          default_h2e3cab8_0    conda-forge
libclang13                14.0.5          default_h3a83d3e_0    conda-forge
libcups                   2.3.3                hf5a7f15_1    conda-forge
libdb                     6.2.32               h9c3ff4c_0    conda-forge
libdeflate                1.12                 h166bdaf_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libevent                  2.1.10               h9b69904_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.3.4                h27087fc_0    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgfortran-ng            12.1.0              h69a702a_16    conda-forge
libgfortran5              12.1.0              hdcd56e2_16    conda-forge
libglib                   2.70.2               h174f98d_4    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           15_linux64_openblas    conda-forge
libllvm14                 14.0.5               he0ac6c6_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.20          pthreads_h78a6416_0    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     14.3                 hd77ab85_0    conda-forge
libsndfile                1.0.31               h9c3ff4c_1    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libtiff                   4.4.0                hc85c160_1    conda-forge
libtool                   2.4.6             h9c3ff4c_1008    conda-forge
libudev1                  249                  h166bdaf_4    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp                   1.2.2                h3452ae3_0    conda-forge
libwebp-base              1.2.2                h7f98852_1    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.14               h22db469_0    conda-forge
libzlib                   1.2.12               h166bdaf_1    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
matplotlib                3.5.2           py310hff52083_0    conda-forge
matplotlib-base           3.5.2           py310h5701ce4_0    conda-forge
minimap2                  2.24                 h7132678_1    bioconda
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.0.29               haf5c9bc_1    conda-forge
mysql-libs                8.0.29               h28c427c_1    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
nspr                      4.32                 h9c3ff4c_1    conda-forge
nss                       3.78                 h2350873_0    conda-forge
numpy                     1.22.3          py310h4ef5377_2    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1o               h7f8727e_0  
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.4.2           py310h769672d_2    conda-forge
patsy                     0.5.2              pyhd8ed1ab_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    9.1.1           py310he619898_1    conda-forge
pip                       22.1.2             pyhd8ed1ab_0    conda-forge
portaudio                 19.6.0               h57a0ea0_5    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pulseaudio                14.0                 h7f54b18_8    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyqt                      5.15.4          py310h29803b5_1    conda-forge
pyqt5-sip                 12.9.0                   pypi_0    pypi
python                    3.10.2          h85951f9_4_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    2_cp310    conda-forge
pytz                      2022.1             pyhd8ed1ab_0    conda-forge
qt-main                   5.15.4               ha5833f6_2    conda-forge
racon                     1.5.0                h7ff8a90_0    bioconda
readline                  8.1.2                h0f457ee_0    conda-forge
scipy                     1.8.0           py310hea5193d_1    conda-forge
seaborn                   0.11.2               hd8ed1ab_0    conda-forge
seaborn-base              0.11.2             pyhd8ed1ab_0    conda-forge
seqtk                     1.3                  h7132678_4    bioconda
setuptools                62.3.4          py310hff52083_0    conda-forge
sip                       6.5.1           py310h122e73d_2    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlite                    3.38.5               h4ff8645_0    conda-forge
statsmodels               0.13.2          py310hde88566_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tornado                   6.1             py310h5764c6d_3    conda-forge
tzdata                    2022a                h191b570_0    conda-forge
unicodedata2              14.0.0          py310h5764c6d_1    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xcb-util                  0.4.0                h166bdaf_0    conda-forge
xcb-util-image            0.4.0                h166bdaf_0    conda-forge
xcb-util-keysyms          0.4.0                h166bdaf_0    conda-forge
xcb-util-renderutil       0.3.9                h166bdaf_0    conda-forge
xcb-util-wm               0.4.1                h166bdaf_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.12               h166bdaf_1    conda-forge
zstd                      1.5.2                h8a70e8d_1    conda-forge
xhu556 commented 2 years ago

my matplotlib packages are v3.5.2 too. Thanks for your help

schmeing commented 2 years ago

I tried to reproduce this issue, but unfortunately I could not. I created a new conda environment using:

conda create -c conda-forge --name gapless python pandas numpy scipy seaborn matplotlib=3.5.2 pillow biopython

However, it runs through perfectly for my data and has no issues in the plotting. Furthermore, I checked the recent documentation of matplotlib and this is still the recommended way of setting the log scale: https://matplotlib.org/stable/api/scale_api.html https://matplotlib.org/stable/gallery/scales/log_demo.html#sphx-glr-gallery-scales-log-demo-py

Can you provide me with a command to create a conda environment that gives this crash. I tried a few packages, but I did not manage to change versions in a way to reproduce the crash. Thank you.

schmeing commented 2 years ago

Or is this something specific to a single dataset and something goes wrong in a way that is different from the error message that it outputs?

schmeing commented 2 years ago

Since I did not get a reply for a month, I hope this issue is fixed. In case it is not, please reopen.