Closed Henry-Ding closed 1 year ago
It doesn't require a lot of memory. But I still want to know the memory on your working machine?
start build block sequence and read base sequence calculate ed distance ed distance thread: 40 1949 pre merge matrix 249 generation cover distribution for cluster HOR thread: 1 get result Time: 11
hi,
Thank you for your prompt reply.
Yes, I used the test sequence and it worked fine. But I downloaded the complete CP068257.1 from ncbi, and I got that memory error.
hicat -i download.fasta -t ./testdata/AlphaSat.fa -th 52
I have 1T of memory.
looking forward to your answer.
best wishes,
ding
Hi, ding, unfortunately, current HiCAT can not input whole chromosome and it was design for only centromere region. I suggest you reduce the size of the input sequence first and then use our HiCAT. If you already have templet sequence, you can used lastz to find regions. If you do not have templet sequence, you can use TRF to first detect tandem repeats and obtain templet sequences.
hi, I am trying to use TRF, can you share how you filter and select the TRF results? looking forward to your answer. best wishes, ding
I don't know why you use TRF. In human genome, we used the active HOR region defined in "Complete genomic and epigenetic maps of human centromeres".
HiCAT can use for any tandem repeat region but cannot decide which one is centromere. It should be provided by user.
hi, Thank you for your prompt reply. I am trying hicat on my own data, should I use published centromeric sequences as templet sequences or TRF results? I used lastz to align published centromere sequences with chromosome sequences, and some chromosome alignments did not yield any results. If I use the TRF results, how do I confirm that the sequence is the centromere sequence I need? looking forward to your answer. best wishes, ding
If previous studies can determine the centromere sequence, the results of previous studies can be used. If not, I suggest you to preform CENH3 chip-seq(CENP-A for human) to determine the functional centromere sequence. In most species, as I know, the centromere sequence is the largest tandem repeat sequence, but chip-seq is used to determine the functional centromere sequence.
hi, I encountered a problem with insufficient memory when using this software. Is the software's memory requirement unusual in step HiCAT_HOR.py? I installed the software using conda and used the data from testdata without error. the fellow is the error message: `ed distance thread: 52 Traceback (most recent call last): File "/miniconda3/envs/mamba/envs/hicat/bin/HiCAT_HOR.py", line 1591, in
main()
File "/miniconda3/envs/mamba/envs/hicat/bin/HiCAT_HOR.py", line 1497, in main
edit_distance_matrix, block_name_index = calculateED(block_sequence, base_sequence,thread)
File "/miniconda3/envs/mamba/envs/hicat/bin/HiCAT_HOR.py", line 68, in calculateED
res = Parallel(n_jobs=thread)(delayed(ed_distance_apply_apply)(data, i) for i in split_in)
File "/miniconda3/envs/mamba/envs/hicat/lib/python3.10/site-packages/joblib/parallel.py", line 1056, in call
self.retrieve()
File "/miniconda3/envs/mamba/envs/hicat/lib/python3.10/site-packages/joblib/parallel.py", line 935, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/miniconda3/envs/mamba/envs/hicat/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/miniconda3/envs/mamba/envs/hicat/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.get_result()
File "/miniconda3/envs/mamba/envs/hicat/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result
raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
The exit codes of the workers are {SIGKILL(-9)}` looking forward to your answer. best wishes, ding