vaquerizaslab / fanc

FAN-C: Framework for the ANalysis of C-like data
GNU General Public License v3.0
106 stars 14 forks source link

IndexError on 4DN files when calling loops #38

Closed kaukrise closed 3 years ago

kaukrise commented 3 years ago

Hi, I am having an issue with the fanc loops command.

I am using a multi-resolution file that I downloaded from the 4DN website, I was wandering what does it mean to define different resolutions and how it might lead to the issue in that command because as far as I understand there is only one way to define the resolution no?

The command looks more or less like this.

fanc loops 4DNFI2TK7L2F.hic@5000   microcH1_5000_loopsAnnotate.loops    -t 5  

and the error looks like this

Traceback (most recent call last):
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/gridmap/job.py", line 242, in execute
    self.ret = self.function(*self.args, **self.kwlist)
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/fanc/peaks.py", line 1365, in process_matrix_segment_intra
    m_expected = expected_f(m_distance)
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2108, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2192, in _vectorize_call
    outputs = ufunc(*inputs)
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/fanc/peaks.py", line 1364, in <lambda>
    expected_f = np.vectorize(lambda x: e[x])
IndexError: index 49791 is out of bounds for axis 0 with size 49791
Traceback (most recent call last):
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/bin/fanc", line 127, in <module>
    Fanc()
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/bin/fanc", line 93, in __init__
    command([sys.argv[0]] + sys.argv[option_ix:], log_level=log_level, verbosity=verbosity)
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/fanc/commands/fanc_commands.py", line 2881, in loops
    peaks = pk.call_peaks(matrix, chromosome_pairs=chromosome_pairs, file_name=o)
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/fanc/peaks.py", line 1270, in call_peaks
    self._find_peaks_intra_matrix(m, intra_expected[chromosome1], c[start1:end1],
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/fanc/peaks.py", line 1167, in _find_peaks_intra_matrix
    self._process_jobs(jobs, peak_info, observed_chunk_distribution)
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/fanc/peaks.py", line 1056, in _process_jobs
    results = msgpack.loads(compressed_results, strict_map_key=False)
  File "/home/dmas/nxf_conda_env/HiCannotation-e79c6b7076dd13ce2817d3f6422eaed6/lib/python3.8/site-packages/msgpack_numpy.py", line 273, in unpackb
    return _unpackb(packed, **kwargs)
  File "msgpack/_unpacker.pyx", line 178, in msgpack._cmsgpack.unpackb
  File "msgpack/_unpacker.pyx", line 126, in msgpack._cmsgpack.get_data_from_buffer
TypeError: a bytes-like object is required, not 'IndexError'

Originally posted by @davidmasp in https://github.com/vaquerizaslab/fanc/issues/7#issuecomment-757043253

kaukrise commented 3 years ago

@davidmasp I opened a new issue from your comment, as it looks like an unrelated error. I am now downloading the 4DN file you are using to reproduce and debug the error. I'll let you know what I find

davidmasp commented 3 years ago

Oh, okay. I could have done that too sorry. I am not sure if it's an issue or I am doing something wrong because my current set up is a bit messy (rather big conda environment).

Let me know if I can help in any way.

kaukrise commented 3 years ago

No problem, I don't think this is related to your setup. There seems to be a problem with properly retrieving the expected values from the Juicer file.

kaukrise commented 3 years ago

fanc-0.9.11.tar.gz

Can you try this version? I think I found the issue - at least I don't get the crash with the same file from 4DN. Note that the number of threads you are providing (-t 5) is very small, and loop calling will probably take forever that way. We recommend running this on a computing cluster with as many cores as you can spare!

davidmasp commented 3 years ago

I will try, thanks a lot!

[re: threads] I am using an HPC already and running multiple instances at the same time. I used 5 threads because it ramps up the memory usage (ram) with more, I guessed that was the normal behaviour. I can't handle more than 100G in my current set up so I needed to use 5 threads max. However I didn't spare much time optimizing it though, these are more like rough estimates.