Closed auberginekenobi closed 4 years ago
Thanks for the detailed issue. Indeed it looks like a sort issue ... again.
In the mapped_2hic_fragments.sh
, could you try to add a -S 80%
in the sort command.
This should define the amount of memory to use.
Note that in this case, the sort command may swap with your TMP_DIR folder.
Hi Nicolas, Alas, same error:
## Sorting valid interaction file ...
LANG=en; sort -T tmp -S 80% -k2,2V -k3,3n -k5,5V -k6,6n -o hic_results/data/rawdata/MB277_grch38_1kgmaj.bwt2pairs.validPairs hic_results/data/rawdata/MB277_grch38_1kgmaj.bwt2pairs.validPairs
/datasets/home/home-00/11/211/ochapman/bin/HiC-Pro_2.11.3-beta/scripts/hic.inc.sh: line 86: 1173 Killed sort -T tmp -S 80% -k2,2V -k3,3n -k5,5V -k6,6n -o hic_results/data/rawdata/MB277_grch38_1kgmaj.bwt2pairs.validPairs hic_results/data/rawdata/MB277_grch38_1kgmaj.bwt2pairs.validPairs
I have been able to get around the issue by feeding HiC-Pro 128G of RAM, but this doesn't seem like a general solution...
In theory, using -T
and -S
are the best ways to improve unix sort performance. Unfortunately, I do not have any other ideas ... sorry. How many reads do you have on your fastq files ?
Note that using HiC-Pro, you can also split reads in chunks ... all chunks will be processed in parallel until the final merge to build the maps. So I would say that you will still have a big sort at the end, but the processing of each chunck should be much faster.
N
Hi Nicolas,
I presume that the mapped_2hic_fragments.log
file is outputting the number of records to be sorted. If true, 2,160,500,000 reads. I had hoped to avoid using the -p option because I'm running on kubernetes, not an HPC scheduler, and I didn't see documentation for chunks on a non-cluster configuration.
You can still slip the reads into chunks without cluster option to avoid sorting memory issue. The only point is that the read chunks will be run one by one ... which may take a lot of time. Best
Issue: For some of my HiC datasets, mapping reliably finishes but downstream hic processing fails. Version: HiC-Pro 2.11.3-beta
Repro:
Issue reproduced on HiC-Pro nonparallel mode, using kubernetes pods; 20cpus x 64GB ram and 2cpus x 8G ram
Comments: My limited expertise says this looks like a memory allocation issue with sort, which gets killed according to the logfiles. For 2 samples, on the 20x64 cpu x ram configuration both return a "Killed" message; on 2x8, one now returns a "read failed: <...> cannot allocate memory". Both samples are of comparable size as others which processed successfully. Logs abbreviated below. Thoughts?
Logs: