Closed mt1022 closed 6 years ago
As I have no experience in Java, I re-wrote sam2tsv in python with pysam. The python version can be found here: https://gist.github.com/mt1022/737ef20f43d5acd4bc75dba0be8f334b. It produce similar output as sam2tsv in jarkit and with some difference in output for soft-clipping and deletion in reads.
you can increase the memory using the -xmx option of java https://stackoverflow.com/questions/14763079/
but anyway, I'm suprised sam2tsv produced such error because it uses a simple streaming process (the information of each sam record is printed as soon it is read). What was your exact command line please ?
I use almost default setting.
My command line looks like: sam2tsv -r hg38.fa $i | awk '$5 != "."' >${i}.pos
here, sam2tsv is path to a bash script:
#!/bin/bash
java -Djvarkit.log.name=sam2tsv -Dfile.encoding=UTF8 -Xmx2G -cp "..." com.github.lindenb.jvarkit.tools.sam2tsv.Sam2Tsv $*
...
in quote is path to htsjdk-2.9.1.jar, commons-logging-1.1.1.jar, ...(many another .jar files) and sam2tsv.jar.
can you please test this without the bash script (the manual doesn't mention it...)
this should be:
java -jar /path/to/sam2tsv.jar -r hg38.fa $i | awk '$5 != "."' >${i}.pos
OK. I'll test it when there are idle fat node.
Hi, Dr. Lindenbaum,
I got time to test the java -jar /path/to/sam2tsv.jar
command today. It worked as expected and no "memory insufficient" error occured. It seems that the error is due to the bash script.
Subject of the issue
I have hundreds of bam files to process. I submitted all the jobs to a node with 1T memory and 20 sam2tsv processes were executed simultaneously. However some output files were empty due to memory insifficient error.
Your environment
sam2tsv --version
)${JAVA_HOME}
:/apps/bioapps/jdk-sun/
Steps to reproduce
Expected behaviour
As I guess sam2tsv processes bam line by line and the reference is 3.1G (human genome), I think 1T memory is enough for 20 such processes (expected to consume 620G memory ).
Actual behaviour
However some commands produced the expected results and some failed with the following error: