Open sandraTriebel opened 1 year ago
I am not entirely sure whether this related to #21 at all, but I have encountered issues with large data sets (>100k non-redundant genomes) as well. Yours looks like a weird multi-processing issue to be honest, so I think this is a whole other topic :/
Got this error message while running ViralClust with SARS-CoV-2 alpha genomes (152,307 non-redundant seqs).
Traceback (most recent call last): File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 663, in <module> perform_clustering() File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 604, in perform_clustering virusClusterer.determine_profile(multiPool) File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 267, in determine_profile allProfiles = p.map(self.profile, self.d_sequences.items()) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks put(task) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647