Closed KunFang93 closed 2 years ago
Hi,
we never ran into this problem. It seems to be an issue with mixcr
for which we can not do much about.
Which kind of filesystem is /scratch/u/kfang/ChenHZ_lab/Neoantigen/test2/work
using?
Hi
Our admin of Cluster doesn't know how to solve this issue too...The filesystem is nfs. Thanks~
Is the NFS lock daemon running on the system? Usually it should, but maybe you can check this, as well.
You can also manually try to run the mixcr
process to see if it was only a transient problem by doing the following:
cd /scratch/u/kfang/ChenHZ_lab/Neoantigen/test2/work/c3/5a4b303cc544c8f790079d0754d082
bash .command.run
If that works you may resume nextNEOpi with -resume
If you can not fix the issue, you can also skip the TCR stuff by using --TCR false
HTH
Thanks for your suggestion! Will try.
Hi,
I tried skipping the TCR stuff by using --TCR false. The pipeline works fine initially but stuck in the MarkDuplicates just like issue #17 . I wondered if there is anything I could do to solve the problem? Thanks for your help!
Best, Kun
Hi, this is strange.
What happens when you cd to the work directory of the MarkDuplicates
process and run the .command.run
script manually?
First use ctrl+c
to stop the pipeline, then look into the .nextflow.log
file and get the work dir for the MarkDuplicates
process. You might want to look for something like TaskHandler[id: 70; name: MarkDuplicates
and note down the directory listed after workDir:
Then cd
into this directory and run bash .command.run
. You can monitor the activity with top
Can you also sent the output of ls -la
in that workDir
Hi,
Thanks for your reply! This is the output of ls -al
in the workDir
(base) [kun@g1400png-ap01lp 1f1771d1843dfa04c9ab2159038b5a]$ ls -la
total 48
drwxrwxr-x 2 kun kun 4096 Nov 8 14:35 .
drwxrwxr-x 3 kun kun 4096 Oct 26 12:00 ..
-rw-rw-r-- 1 kun kun 0 Nov 8 14:35 .command.begin
-rw-rw-r-- 1 kun kun 946 Nov 8 14:35 .command.err
-rw-rw-r-- 1 kun kun 1490 Nov 8 14:32 .command.log
-rw-rw-r-- 1 kun kun 0 Nov 8 14:35 .command.out
-rw-rw-r-- 1 kun kun 11019 Oct 26 12:13 .command.run
-rw-rw-r-- 1 kun kun 650 Oct 26 12:13 .command.sh
-rw-rw-r-- 1 kun kun 0 Nov 8 14:35 .command.trace
lrwxrwxrwx 1 kun kun 97 Nov 8 14:35 GRCh38.d1.vd1.dict -> /data/kun/software/nextNEOpi/resources/references/hg38/gdc/GRCh38.d1.vd1/fasta/GRCh38.d1.vd1.dict
lrwxrwxrwx 1 kun kun 95 Nov 8 14:35 GRCh38.d1.vd1.fa -> /data/kun/software/nextNEOpi/resources/references/hg38/gdc/GRCh38.d1.vd1/fasta/GRCh38.d1.vd1.fa
lrwxrwxrwx 1 kun kun 99 Nov 8 14:35 GRCh38.d1.vd1.fa.fai -> /data/kun/software/nextNEOpi/resources/references/hg38/gdc/GRCh38.d1.vd1/fasta/GRCh38.d1.vd1.fa.fai
lrwxrwxrwx 1 kun kun 138 Nov 8 14:35 Patient353_T1star_normal_DNA_aligned_uBAM_merged.bam -> /data/kun/ChenHZ_lab/Neoantigens/patient353/T1/work/bb/6282fe6f5845e1a2dc962465ab05c4/Patient353_T1star_normal_DNA_aligned_uBAM_merged.bam
When I am trying to run bash .command.run
, the screen freezes with the output
(base) [kun@g1400png-ap01lp 1f1771d1843dfa04c9ab2159038b5a]$ bash .command.run
sambamba 0.7.1
by Artem Tarasov and Pjotr Prins (C) 2012-2019
LDC 1.20.0 / DMD v2.090.1 / LLVM7.0.0 / bootstrap LDC - the LLVM D compiler (0.17.6)
finding positions of the duplicate reads in the file...
22:35:42.344 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Nov 08 22:35:42 UTC 2022] SetNmMdAndUqTags --INPUT /dev/stdin --OUTPUT Patient353_T1star_normal_DNA_aligned_sort_mkdp.bam --TMP_DIR /tmp/Kun/nextNEOpi --VALIDATION_STRINGENCY LENIENT --MAX_RECORDS_IN_RAM 4194304 --CREATE_INDEX true --REFERENCE_SEQUENCE GRCh38.d1.vd1.fa --IS_BISULFITE_SEQUENCE false --SET_ONLY_UQ false --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 2 --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
when I use top
, I only see java process
top - 14:58:45 up 326 days, 5:51, 3 users, load average: 5.13, 4.99, 4.96
Tasks: 601 total, 1 running, 600 sleeping, 0 stopped, 0 zombie
%Cpu(s): 9.8 us, 0.8 sy, 0.0 ni, 89.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 39465558+total, 1187048 free, 14884032 used, 37858451+buff/cache
KiB Swap: 2094076 total, 1415656 free, 678420 used. 37851708+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
67928 yufan 20 0 5604544 5.2g 1176 S 399.3 1.4 2578:19 bwa
240190 yufan 20 0 36.2g 3.3g 19228 S 107.6 0.9 1085:30 java
5215 gdm 20 0 806208 87512 704 S 2.6 0.0 6112:34 gsd-color
272272 kun 20 0 162548 2816 1588 R 0.7 0.0 0:00.48 top
248733 kun 20 0 70.8g 373588 15280 S 0.3 0.1 0:12.51 java
1 root 20 0 192024 3364 1632 S 0.0 0.0 11:28.58 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:39.49 kthreadd
However, I checked with ps -ax
. It looks like there are several commands is submitted
248492 pts/0 S+ 0:00 bash .command.run
248513 pts/0 S+ 0:00 tee .command.out
248514 pts/0 S+ 0:00 tee .command.err
248515 pts/0 S+ 0:00 bash .command.run
248518 pts/0 Sl+ 0:00 Singularity runtime parent
248539 pts/0 S+ 0:00 /bin/bash /data/kun/ChenHZ_lab/Neoantigens/patient353/T1/work/89/1f1771d1843dfa04c9ab2159038b5a/.command.run nxf_trace
248551 ? S< 0:00 [loop0]
248575 pts/0 S+ 0:00 /bin/bash -ue /data/kun/ChenHZ_lab/Neoantigens/patient353/T1/work/89/1f1771d1843dfa04c9ab2159038b5a/.command.sh
248577 pts/0 S+ 0:06 /bin/bash /data/kun/ChenHZ_lab/Neoantigens/patient353/T1/work/89/1f1771d1843dfa04c9ab2159038b5a/.command.run nxf_trace
248584 pts/0 Sl+ 6:39 sambamba markdup -t 20 --tmpdir /tmp/Kun/nextNEOpi --hash-table-size=1048576 --overflow-list-size=1000000 --io-buffer-size=1024 Patient353_T1s
248585 pts/0 S+ 0:00 samtools sort -@20 -m 8G -O BAM -l 0 /dev/stdin
248586 pts/0 S+ 0:00 python /opt/conda/bin/gatk --java-options -Xmx64G SetNmMdAndUqTags --TMP_DIR /tmp/Kun/nextNEOpi -R GRCh38.d1.vd1.fa -I /dev/stdin -O Patient35
248733 pts/0 Sl+ 0:12 java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.co
I then use the following code to check if other PID is running
if ps -p $1 > /dev/null
then
echo "$1 is running"
# Do something knowing the pid exists, i.e. the process with $PID is running
fi
and found that 248584, 248585, 248586 is running.
Weird....Please let me know if any information is needed. Thanks for your help!
Hmm...
can you check if /tmp
is running out of space when the MarkDuplicates
process is running
I the problem could be related to a memory limit, can you please post the contents of /data/kun/ChenHZ_lab/Neoantigens/patient353/T1/work/89/1f1771d1843dfa04c9ab2159038b5a/.error.log
?
Try to reserve more memory in slurm for the process by setting something like:
withName:MarkDuplicates {
cpus = 4
memory = "96 GB"
}
in conf/process.config
Sorry for the late reply. I don't see .error.log in the folder
(base) [kun@g1400png-ap01lp 1f1771d1843dfa04c9ab2159038b5a]$ less .
./ ../ .command.begin .command.err .command.log .command.out .command.run .command.sh .command.trace .exitcode
Ok, I will try it with modified config file. Since currently we found alternative way to predict neoantigens, I will try your suggestion and report the results later in case other run into same problem. Thanks for your time and help again!
Hi,
After solving patch error (issue#14) by installing patch in our computation node, the pipeline run into a new error
I wondered if your have any idea how to solve this problem?
Best, Kun