tecangenomics / nudup

NuDup -- Marks/removes duplicate molecules based on the molecular tagging technology used in Tecan products.
http://www.tecangenomics.com
GNU Lesser General Public License v3.0
14 stars 9 forks source link

Broken pipe? #10

Open FabianGrammes opened 7 years ago

FabianGrammes commented 7 years ago

Hi, trying to run nudup.py I get the following error message

2017-03-17 13:11:44,039 [     INFO] - Deduplicating NuGEN paired end reads...
2017-03-17 13:11:44,361 [     INFO] - Using molecular tag sequence from Index FASTQ read
2017-03-17 13:11:44,362 [     INFO] - Appending molecular tag sequence to SAM/BAM read name
tee: /tmp/nudup__aJ1wL/named_pipe: Broken pipe

The script progresses from there but the resulting .bam files are empty. I'm running it using anaconda/1.9.1 and samtools/1.3. Any help is welcome.

shuelga commented 7 years ago

Hi! When we see issues like this it usually has to do with the /tmp directory being used. Do you have sufficient space there? You can use the -T option to reset the location of the processing directory to a location that may have more space.

peterwc commented 6 years ago

Hello,

I am unsure if this is the exact same, but I believe it is similar. I am getting this error code:

2018-01-26 14:28:08,973 [     INFO] - Processing sorted SAM/BAM with molecular tag sequence in read name (assumes sorted)
samtools view: writing to standard output failed: Broken pipe
samtools view: error closing standard output: -1
2018-01-26 14:28:10,736 [    ERROR] - 

I have tried it with and without changing the -T option. I have plenty of hard drive and RAM space.

FabianGrammes commented 6 years ago

Hi @peterwc; I never got nudup.py to run on my cluster either and I have also plenty of RAM and space. Not even even with small test files. Anyway in case you havn't found a solution I've wrote my own script which you can try: https://gitlab.com/fabian.grammes/RRBS_kit ; script rrbs_dedup.py

sklages commented 6 years ago

Just another platform? You need to sign in or sign up before continuing. ... :-(

peterwc commented 6 years ago

Hey @FabianGrammes, I believe I followed the comments from one of the other threads on here. It works after a bit. Overall, its a great program and runs well on linux machines.

Peter

cjfields commented 5 years ago

Hi all, I am seeing a similar issue on our cluster, though I am using a local scratch as the tmp directory. In my case the named pipe hangs indefinitely on the adding the molecular tag to the read name. I also see several defunct samtools processes in the background. Using top:

top - 20:18:58 up 29 days,  6:54,  1 user,  load average: 14.69, 15.27, 14.74
Tasks: 321 total,  15 running, 302 sleeping,   0 stopped,   4 zombie
%Cpu(s): 57.7 us,  0.9 sy,  0.0 ni, 41.3 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 39622716+total, 37905289+free,  4447912 used, 12726356 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 38831289+avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
2435148 cjfields  20   0  168184   2508   1600 R   0.3  0.0   0:00.05 top -u cjfields
2420613 cjfields  20   0  125980   2428   1708 S   0.0  0.0   0:00.01 /usr/bin/bash
2433627 cjfields  20   0  193756  10008   3608 S   0.0  0.0   0:00.08 python /home/groups/hpcbio/apps/NuGen/nudup/nudup.py -T /scratch -f DMVF015_SubQ_AGTGAG_L00M_R2_001.fastq -o DMVF015_SubQ_AG+
2433699 cjfields  20   0       0      0      0 Z   0.0  0.0   0:00.00 [samtools] <defunct>
2433700 cjfields  20   0       0      0      0 Z   0.0  0.0   0:00.00 [samtools] <defunct>
2434124 cjfields  20   0  192216   2672   1088 S   0.0  0.0   0:00.00 sshd: cjfields@pts/3
2434125 cjfields  20   0  125820   2180   1676 S   0.0  0.0   0:00.00 -system-specific
2435140 cjfields  20   0       0      0      0 Z   0.0  0.0   0:00.00 [samtools] <defunct>
2435141 cjfields  20   0       0      0      0 Z   0.0  0.0   0:00.00 [samtools] <defunct>
2435142 cjfields  20   0  107952    380    276 S   0.0  0.0   0:00.00 cat /scratch/nudup_X1lDc6/named_pipe /scratch/nudup_qjxxlN/named_pipe
2435143 cjfields  20   0  107932    636    512 S   0.0  0.0   0:00.00 tee /scratch/nudup_H5q7vH/named_pipe

If I kill the process it exits as below:

$ python $NUGEN_NODUP_HOME/nudup.py -T /scratch -f DMVF015_SubQ_AGTGAG_L00M_R2_001.fastq -o DMVF015_SubQ_AGTGAG.nugen_dedup DMVF015_SubQ_AGTGAG.sam_stripped.sam
2018-09-18 18:50:43,447 [     INFO] - Deduplicating NuGEN single end reads...
2018-09-18 18:50:55,173 [     INFO] - Using molecular tag sequence from Index FASTQ read
2018-09-18 18:50:55,173 [     INFO] - Appending molecular tag sequence to SAM/BAM read name
^CTraceback (most recent call last):
  File "/home/groups/hpcbio/apps/NuGen/nudup/nudup.py", line 1110, in <module>
    w.main(umi_start=args.start, umi_length=args.length)
  File "/home/groups/hpcbio/apps/NuGen/nudup/nudup.py", line 1003, in main
    w = self.process_unsynced_sam(umi_start, umi_length)
  File "/home/groups/hpcbio/apps/NuGen/nudup/nudup.py", line 939, in process_unsynced_sam
    logger.debug('Add molecular tag sequence to sam: %s', umi_nohead_sam)
  File "/home/groups/hpcbio/apps/NuGen/nudup/nudup.py", line 170, in __exit__
    time.sleep(self._POLL_TIME)
KeyboardInterrupt
shuelga commented 5 years ago

For this kind of error, the problem concerns the tmp directory. Using the -T $TMPDIR option to redirect the tmp directory to $TMPDIR is not enough. The issue can be fixed by adding the following line to ~/.bash_profile file: export TMPDIR="somewhere_you_redirect_the_tmp_directory"

Originally posted by @boro2013 in https://github.com/nugentechnologies/nudup/issues/19#issuecomment-458194227