ksahlin / isONclust

De novo clustering of long transcript reads into genes
GNU General Public License v3.0
47 stars 8 forks source link

ValueError: min() arg is an empty sequence #16

Closed JoshLoecker closed 3 years ago

JoshLoecker commented 3 years ago

I am getting this error within the multiprocessing.pool.RemoteTraceback library. I am running the following command

isONclust \
--ont \
--fastq results/.temp/merged.barcode.clusters.fastq \
--q 7 \
--aligned_threshold 0.85 \
--min_fraction 0.95 \
--mapped_threshold 0.7 \
--min_shared 55 \
--outfolder results/isONclust/merged_barcodes/origins

And I am receiving this error

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/joshl/miniconda3/envs/mapt_pipeline/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Users/joshl/miniconda3/envs/mapt_pipeline/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/Users/joshl/miniconda3/envs/mapt_pipeline/lib/python3.7/site-packages/modules/parallelize.py", line 17, in reads_to_clusters_helper
    return cluster.reads_to_clusters(*args, **kwargs)
  File "/Users/joshl/miniconda3/envs/mapt_pipeline/lib/python3.7/site-packages/modules/cluster.py", line 224, in reads_to_clusters
    lowest_batch_index = max(1, min(prev_b_indices))
ValueError: min() arg is an empty sequence
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/joshl/miniconda3/envs/mapt_pipeline/bin/isONclust", line 263, in <module>
    main(args)
  File "/Users/joshl/miniconda3/envs/mapt_pipeline/bin/isONclust", line 72, in main
    clusters, representatives = parallelize.parallel_clustering(read_array, p_emp_probs, args)
  File "/Users/joshl/miniconda3/envs/mapt_pipeline/lib/python3.7/site-packages/modules/parallelize.py", line 156, in parallel_clustering
    cluster_results =res.get(999999999) # Without the timeout this blocking call ignores all signals.
  File "/Users/joshl/miniconda3/envs/mapt_pipeline/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: min() arg is an empty sequence
JoshLoecker commented 3 years ago

I have been able to alleviate the issue by setting the --t option to 1. Obviously, this is not optimal as it does not use the maximum number of threads, but it does narrow down the issue slightly.

ksahlin commented 3 years ago

Hi Josh,

Any chance you could share the file where this happens? If not, could you provide more of the output printed to the terminal before the error occurs? This would help on how to best deal with the empty sequence error.

JoshLoecker commented 3 years ago

@ksahlin, oops. Sorry about that. Here is a link to the fastq I am using: https://pastebin.com/15kNuA0V

Additionally, here is the full terminal output of the command: https://pastebin.com/n0K2T5BF

ksahlin commented 3 years ago

Shoot, it says that the fastq link expired. I got the output file though. Any chance you could send the reads again?

JoshLoecker commented 3 years ago

@ksahlin Hmm, not sure why that happened. Here is a new link: https://pastebin.com/GHdq8xfN

JoshLoecker commented 3 years ago

@ksahlin I can't seem to get this link to work either. Here is a different website: https://controlc.com/16e2b9d0

ksahlin commented 3 years ago

Ok, great that worked. Thanks!

ksahlin commented 3 years ago

Alright, I fixed it and the new version (0.0.6.1) is available on PyPI and here on Github. Thanks for the report!