I changed the Hi-C scaffolding algorithm (YAHS to EndHiC) and a previous issue with pure deletions went away, but now I got an issue with the duplication boolean index not having the same size as the duplications dataframe.
`0:00:18.075300 Reading in original assembly
0:00:19.136065 Loading repeats
0:00:37.415395 Filtering mappings
0:01:28.951904 Search for possible break points
1:26:15.808132 Search for possible bridges
1:28:21.136944 Scaffold the contigs
Start
81914
Iteration 1
0:00:05.637656 52211
0:01:57.747884 37318
Iteration 2
0:00:01.071638 36312
0:00:50.240074 34525
Iteration 3
0:00:00.818671 34450
0:00:38.636074 34185
Iteration 4
0:00:00.681951 34171
0:00:33.416415 34125
Iteration 5
0:00:00.592918 34123
0:00:33.210053 34110
Iteration 6
0:00:00.534959 34110
0:00:32.124315 34109
Iteration 7
0:00:00.527677 34109
0:00:32.156766 34108
Iteration 8
0:00:00.526948 34108
0:00:32.149360 34108
RemoveDuplicates
33697
Iteration 1
0:00:00.504654 33697
0:00:35.549221 33442
Iteration 2
0:00:00.667288 33424
0:00:36.403955 33369
Iteration 3
0:00:00.619044 33363
0:00:32.608814 33354
Iteration 4
0:00:00.514491 33353
0:00:32.797004 33349
Iteration 5
0:00:00.504719 33349
0:00:32.317836 33349
PlaceUnambigouslyPlaceables
32781
Iteration 1
0:00:00.500207 32781
0:00:39.784164 32438
Iteration 2
0:00:00.729437 32413
0:00:37.434524 32353
Iteration 3
0:00:00.574755 32351
0:00:36.378086 32343
Iteration 4
0:00:00.494534 32343
0:00:35.192668 32342
Iteration 5
0:00:00.492627 32342
0:00:35.138941 32342
CombineOnMatchingExtensions
29295
TrimAmbiguousOverlap
Traceback (most recent call last):
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 13362, in
main(sys.argv[1:])
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 13189, in main
GaplessScaffold(args[0], args[1], args[2], min_mapq, min_mapping_length, min_length_contig_brea
k, large_reads, large_contigs, prefix, stats)
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 9121, in GaplessScaffold
scaffold_paths, trim_repeats = ScaffoldContigs(contig_parts, bridges, mappings, cov_probs, repe
ats, prob_factor, min_mapping_length, max_dist_contig_end, prematurity_threshold, ploidy, maxloop
units)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 7864, in ScaffoldContigs
scaffold_paths = TraverseScaffoldGraph(scaffolds, scaffold_graph, graph_ext, scafbridges, org
scaf_conns, ploidy, max_loop_units)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 7437, in TraverseScaffoldGraph
scaffold_paths = TrimAmbiguousOverlap(scaffold_paths, scaffold_graph, ploidy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 7254, in TrimAmbiguousOverlap
scaffold_paths = RemoveDuplicates(scaffold_paths, True, ploidy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 6774, in RemoveDuplicates
duplications = duplications.loc[duplications.merge(rem_paths, on=['apid','ahap','bpid','bhap'],
how='left', indicator=True)['_merge'].values == "left_only", ['apid','ahap']].copy() # Paths that
are part of a larger part
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/conda/gapless_mamba/lib/python3.11/site-packages/pandas/core/in
dexing.py", line 1067, in __getitem__
return self._getitem_tuple(key)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/conda/gapless_mamba/lib/python3.11/site-packages/pandas/core/in
dexing.py", line 1256, in _getitem_tuple
return self._getitem_tuple_same_dim(tup)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/conda/gapless_mamba/lib/python3.11/site-packages/pandas/core/in
dexing.py", line 924, in _getitem_tuple_same_dim
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/conda/gapless_mamba/lib/python3.11/site-packages/pandas/core/in
dexing.py", line 1292, in _getitem_axis
return self._getbool_axis(key, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/conda/gapless_mamba/lib/python3.11/site-packages/pandas/core/indexing.py", line 1091, in _getbool_axis
key = check_bool_indexer(labels, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/conda/gapless_mamba/lib/python3.11/site-packages/pandas/core/indexing.py", line 2571, in check_bool_indexer
return check_array_indexer(index, result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/conda/gapless_mamba/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 552, in check_array_indexer
raise IndexError(
IndexError: Boolean index has wrong length: 115672 instead of 115669`
I can bypass this error by forcing the boolean index to have the same length
`
# Merge the dataframes and keep track of where the data came from
merged_df = duplications.merge(rem_paths, on=['apid','ahap','bpid','bhap'], how='left', indicator=True)
# Filter based on the '_merge' column
duplications = merged_df.loc[merged_df['_merge'] == 'left_only', ['apid','ahap']].copy()
`
Could that produce any other issues down the pipeline?
Hello @schmeing
I changed the Hi-C scaffolding algorithm (YAHS to EndHiC) and a previous issue with pure deletions went away, but now I got an issue with the duplication boolean index not having the same size as the duplications dataframe.
`0:00:18.075300 Reading in original assembly 0:00:19.136065 Loading repeats 0:00:37.415395 Filtering mappings 0:01:28.951904 Search for possible break points 1:26:15.808132 Search for possible bridges 1:28:21.136944 Scaffold the contigs Start 81914 Iteration 1 0:00:05.637656 52211 0:01:57.747884 37318 Iteration 2 0:00:01.071638 36312 0:00:50.240074 34525 Iteration 3 0:00:00.818671 34450 0:00:38.636074 34185 Iteration 4 0:00:00.681951 34171 0:00:33.416415 34125 Iteration 5 0:00:00.592918 34123 0:00:33.210053 34110 Iteration 6 0:00:00.534959 34110 0:00:32.124315 34109 Iteration 7 0:00:00.527677 34109 0:00:32.156766 34108 Iteration 8 0:00:00.526948 34108 0:00:32.149360 34108 RemoveDuplicates 33697 Iteration 1 0:00:00.504654 33697 0:00:35.549221 33442 Iteration 2 0:00:00.667288 33424 0:00:36.403955 33369 Iteration 3 0:00:00.619044 33363 0:00:32.608814 33354 Iteration 4 0:00:00.514491 33353 0:00:32.797004 33349 Iteration 5 0:00:00.504719 33349 0:00:32.317836 33349 PlaceUnambigouslyPlaceables 32781 Iteration 1 0:00:00.500207 32781 0:00:39.784164 32438 Iteration 2 0:00:00.729437 32413 0:00:37.434524 32353 Iteration 3 0:00:00.574755 32351 0:00:36.378086 32343 Iteration 4 0:00:00.494534 32343 0:00:35.192668 32342 Iteration 5 0:00:00.492627 32342 0:00:35.138941 32342 CombineOnMatchingExtensions 29295 TrimAmbiguousOverlap Traceback (most recent call last): File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 13362, in
main(sys.argv[1:])
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 13189, in main
GaplessScaffold(args[0], args[1], args[2], min_mapq, min_mapping_length, min_length_contig_brea
k, large_reads, large_contigs, prefix, stats)
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 9121, in GaplessScaffold
scaffold_paths, trim_repeats = ScaffoldContigs(contig_parts, bridges, mappings, cov_probs, repe
ats, prob_factor, min_mapping_length, max_dist_contig_end, prematurity_threshold, ploidy, maxloop
units)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 7864, in ScaffoldContigs
scaffold_paths = TraverseScaffoldGraph(scaffolds, scaffold_graph, graph_ext, scafbridges, org
scaf_conns, ploidy, max_loop_units)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 7437, in TraverseScaffoldGraph
scaffold_paths = TrimAmbiguousOverlap(scaffold_paths, scaffold_graph, ploidy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 7254, in TrimAmbiguousOverlap
scaffold_paths = RemoveDuplicates(scaffold_paths, True, ploidy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nesi/nobackup/vuw03529/bin/test/gapless/gapless.py", line 6774, in RemoveDuplicates
duplications = duplications.loc[duplications.merge(rem_paths, on=['apid','ahap','bpid','bhap'],
how='left', indicator=True)['_merge'].values == "left_only", ['apid','ahap']].copy() # Paths that
are part of a larger part