icbi-lab / nextNEOpi

nextNEOpi: a comprehensive pipeline for computational neoantigen prediction
Other
65 stars 23 forks source link

Error in SetNmMdAndUqTags: Input must be coordinate-sorted #41

Closed gri11 closed 2 months ago

gri11 commented 11 months ago

I attempt to use nextNEOpi with testdata. I got an error on this MarkDuplicates process and this loop for long time(3-4 days):

~> TaskHandler[id: 14; name: MarkDuplicates (test_sample : normal_DNA); status: NEW; exit: -; error: -; workDir: /home/ubuntu/nextNEOpi.1.4.0/work/bd/678fd86a7fdaa95335401fad874cdd]
Aug-26 03:27:54.564 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 1 -- submitted tasks are shown below
~> TaskHandler[id: 13; name: MarkDuplicates (test_sample : tumor_DNA); status: RUNNING; exit: -; error: -; workDir: /home/ubuntu/nextNEOpi.1.4.0/work/ee/3588ef274fc34327ab2ee6a459d07a]

And got this error inside the workDir of MarkDuplicates:

INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred

sambamba 0.8.1
 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.20.0 / DMD v2.090.1 / LLVM7.0.0 / bootstrap LDC - the LLVM D compiler (0.17.6)

finding positions of the duplicate reads in the file...
15:17:17.421 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Aug 22 15:17:17 UTC 2023] SetNmMdAndUqTags --INPUT /dev/stdin --OUTPUT test_sample_tumor_DNA_aligned_sort_mkdp.bam --TMP_DIR /tmp/ubuntu/nextNEOpi --VALIDATION_STRINGENCY LENIENT --MAX_RECORDS_IN_RAM 4194304 --CREATE_INDEX true --REFERENCE_SEQUENCE GRCh38.d1.vd1.fa --IS_BISULFITE_SEQUENCE false --SET_ONLY_UQ false --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 2 --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Tue Aug 22 15:17:17 UTC 2023] Executing as mambauser@ip-172-31-34-222 on Linux 5.19.0-1025-aws amd64; OpenJDK 64-Bit Server VM 17.0.7+7-Debian-1deb11u1; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.4.0.0
  sorted 1159001 end pairs
     and 15 single ends (among them 0 unmatched pairs)
  collecting indices of duplicate reads...   done in 1083 ms
  found 74860 duplicates
collected list of positions in 0 min 18 sec
marking duplicates...
collected list of positions in 0 min 32 sec
samtools sort: couldn't allocate memory for bam_mem
[Tue Aug 22 15:17:45 UTC 2023] picard.sam.SetNmMdAndUqTags done. Elapsed time: 0.48 minutes.
Runtime.totalMemory()=335544320
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.SAMException: Input must be coordinate-sorted for this program to run. Found: unsorted
    at picard.sam.SetNmMdAndUqTags.doWork(SetNmMdAndUqTags.java:125)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:289)
    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /opt/gatk/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx64G -jar /opt/gatk/gatk-package-4.4.0.0-local.jar SetNmMdAndUqTags --TMP_DIR /tmp/ubuntu/nextNEOpi -R GRCh38.d1.vd1.fa -I /dev/stdin -O test_sample_tumor_DNA_aligned_sort_mkdp.bam --CREATE_INDEX true --MAX_RECORDS_IN_RAM 4194304 --VALIDATION_STRINGENCY LENIENT

Is that the problem with computer's memory? So, I have to upgrade the computer' memory. Or there other problem that I should fix?

Spec: CPU: 4 cores RAM: 16 GB Storage: 1 TB

Environment: OS: Linux 5.19.0-1025-aws amd64 Java Version: OpenJDK 64-Bit Server VM 17.0.7+7-Debian-1deb11u1

riederd commented 10 months ago

Yeah you are right, it seems you are running out of memory. As stated in the README we recommend to use a system with minimum 64GB of RAM. It might also work with 32GB but you will need to adjust some memory related settings in conf/params.conf, e.g.:

// Java settings: please adjust to your memory available
  JAVA_Xmx = "-Xmx32G"

  // samtools memory: please adjust to your memory available
  STperThreadMem = "2G"

  // sambamba settings: please adjust to your memory available
  SB_hash_table_size = "1048576"
  SB_overflow_list_size = "1000000"
  SB_io_buffer_size = "1024"
  SB_sort_mem = "16G"
riederd commented 9 months ago

Is the problem still persisting, or can I close the issue?

gri11 commented 9 months ago

Still finding appropriate memory usage setting for my server (16 Core, 64GB).

riederd commented 8 months ago

Ping

riederd commented 2 months ago

No feedback, assuming it is solved