simon-anders / htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
https://htseq.readthedocs.io/en/release_0.11.1/
GNU General Public License v3.0
122 stars 77 forks source link

Htseq-count killed #105

Closed zyh4482 closed 3 years ago

zyh4482 commented 3 years ago

When I use htseq-count to process my bam files, I met the following issue: 66200000 alignment record pairs processed killed My environment is Ubuntu 20.04 loaded on VMware Workstation 16, 8Tb storage, 32Gb RAM.

I have 140 bam files required for counting. I tried "for loop" to output my data. for i in $(seq 1 140); do htseq-count --format bam --order pos --stranded no --minaqual 10 --type exon --idattr gene_id --mode union /home/tomas/project/bam/a/a${i}.bam /home/tomas/project/ref/gencode.v38.annotation.gtf>/home/tomas/project/expression/a/a${i}_gene.tsv; done

Most of them works. But there are 14 "killed" failures, resulting in 14 " .tsv" files with 0 byte.

I'm not sure if it is the problem with RAM because I opened 4 terminals and worked on it parallelly (e.g 1-40, 41-70, 71-100, 101-140). But I checked the memory during that time, there's no warning of memory exhausation.

May I ask why it happens and how to deal with it? thank you

iosonofabio commented 3 years ago

Memory exhausted, the Python interpreter gives up before even throwing an error. Get a bigger machine or, since htseq-count is trivially parallelizable, split the bam files