rnajena / viralclust

Small pipeline to cluster viral genomes based on their k-mer content. WiP
GNU General Public License v3.0
15 stars 4 forks source link

HDBSCAN error for large dataset #20

Open sandraTriebel opened 1 year ago

sandraTriebel commented 1 year ago

Got this error message while running ViralClust with SARS-CoV-2 alpha genomes (152,307 non-redundant seqs).

Traceback (most recent call last): File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 663, in <module> perform_clustering() File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 604, in perform_clustering virusClusterer.determine_profile(multiPool) File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 267, in determine_profile allProfiles = p.map(self.profile, self.d_sequences.items()) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks put(task) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647

klamkiew commented 1 year ago

I am not entirely sure whether this related to #21 at all, but I have encountered issues with large data sets (>100k non-redundant genomes) as well. Yours looks like a weird multi-processing issue to be honest, so I think this is a whole other topic :/