micahvista / VACmap

VACmap: a long-read aligner specifically designed for complex structural variation discovery
GNU General Public License v3.0
25 stars 0 forks source link

Couple issues when running VACmap #5

Closed jamesc99 closed 1 month ago

jamesc99 commented 2 months ago

Hi there,

I have been using VACmap for weeks and it performs pretty well in detecting complex SVs. However, I continuously encountered several issue and I hope you can fix it to help VACmap better!

1. Issue 1: enormous size for alignment file and long mapping time It usually took a couple of times more running time for VACmap than minimap2 on the same data (2-3 times more in general), and generated extremely large intermediate and alignment files (SAM and BAM). (like 166 fastq.gz file to generate 3T SAM file) This issue is understandable as I know you add a non-linear step in VACmap to help split reads. But this may relate to the second issue I list below.

2. Issue2: possible 'multiprocessing' module issue during mapping (urgent issue) this continuously happened to me when I was trying to align some high-coverage LR data. typical error:

INFO: 08/12/2024 06:39:09 PM 36 / sec in the last 46 minutes, 30 / sec AVG
Process Process-11:
Process Process-7:
Process Process-8:
Process Process-10:
Process Process-4:
Process Process-6:
Process Process-5:
Process Process-9:
Process Process-3:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 11758, in get_list_of_readmap_stdout
    cooked_queue.put(a_list)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 11758, in get_list_of_readmap_stdout
    cooked_queue.put(a_list)
  File "<string>", line 2, in put
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "<string>", line 2, in put
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 11758, in get_list_of_readmap_stdout
    cooked_queue.put(a_list)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 11758, in get_list_of_readmap_stdout
    cooked_queue.put(a_list)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "<string>", line 2, in put
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "<string>", line 2, in put
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 11758, in get_list_of_readmap_stdout
    cooked_queue.put(a_list)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 11758, in get_list_of_readmap_stdout
    cooked_queue.put(a_list)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 11758, in get_list_of_readmap_stdout
    cooked_queue.put(a_list)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "<string>", line 2, in put
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
  File "<string>", line 2, in put
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 11758, in get_list_of_readmap_stdout
    cooked_queue.put(a_list)
  File "<string>", line 2, in put
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
BrokenPipeError: [Errno 32] Broken pipe
  File "<string>", line 2, in put
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
BrokenPipeError: [Errno 32] Broken pipe
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
BrokenPipeError: [Errno 32] Broken pipe
BrokenPipeError: [Errno 32] Broken pipe
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/site-packages/VACmap-1.0-py3.10-linux-x86_64.egg/vacmap/mammap_sensitive.py", line 5612, in stdout_writer
    a_list = cooked_queue.get()
  File "<string>", line 2, in get
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/managers.py", line 818, in _callmethod
    kind, result = conn.recv()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/vacmap_env/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

I have already set ulimit -n 4096 and increase mem to 128 gb (though I think mem is not the issue), but are still having this problem.

3. Issue3: failed to add RG tag to BAM file (for pbsv calling) As my question in #3, I run the lastest version of VACmap with --rg-id and --rg-sm option. vacmap -ref ${ref38} -read ${fastqfile} -mode S --MD -t 8 --rg-id ${rg_id} --rg-sm ${rg_sm} > ${fastq_basename}.sam

however, when I checked the header of BAM file by samtools view -H, there is no RG header in it.

/hgsc_software/samtools/samtools-1.9/bin/samtools view -H ~/ryan_scratch_ln/benchmark_inv/rawdata/na19238_trio/na19240_mother/ont/vacmap/NA19240_ONT_vacmap.sorted.bam
@HD     VN:1.0  SO:coordinate
@SQ     SN:chr1 LN:248956422
@SQ     SN:chr2 LN:242193529
@SQ     SN:chr3 LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ     SN:chr6 LN:170805979
@SQ     SN:chr7 LN:159345973
@SQ     SN:chr8 LN:145138636
@SQ     SN:chr9 LN:138394717
@SQ     SN:chr10        LN:133797422
@SQ     SN:chr11        LN:135086622
@SQ     SN:chr12        LN:133275309
@SQ     SN:chr13        LN:114364328
@SQ     SN:chr14        LN:107043718
@SQ     SN:chr15        LN:101991189
@SQ     SN:chr16        LN:90338345
@SQ     SN:chr17        LN:83257441
@SQ     SN:chr18        LN:80373285
@SQ     SN:chr19        LN:58617616
@SQ     SN:chr20        LN:64444167
@SQ     SN:chr21        LN:46709983
@SQ     SN:chr22        LN:50818468
@SQ     SN:chrX LN:156040895
@SQ     SN:chrY LN:57227415
@SQ     SN:chrM LN:16569
@PG     PN:VACmap       ID:VACmap       VN:1.0  CL:vacmap -ref /stornext/snfs4/next-gen/scratch/ryan/reference/human-grch38.fasta -read ../rawdata/20230328_GM19240_UL_eee-prom1-2G-PAK58474_guppy-5.0.11-sup-prom_fastq_pass.fastq.gz -mode S --MD -t 8 --rg-id 20230328_GM19240_UL_eee-prom1-2G-PAK58474_guppy-5.0.11-sup-prom_fastq_pass --rg-sm NA19240_ONT_vacmap
@PG     PN:VACmap       ID:VACmap-29FA8FDB      VN:1.0  CL:vacmap -ref /stornext/snfs4/next-gen/scratch/ryan/reference/human-grch38.fasta -read ../rawdata/20230328_GM19240_UL_eee-prom1-2H-PAK84678_guppy-5.0.11-sup-prom_fastq_pass.fastq.gz -mode S --MD -t 8 --rg-id 20230328_GM19240_UL_eee-prom1-2H-PAK84678_guppy-5.0.11-sup-prom_fastq_pass --rg-sm NA19240_ONT_vacmap
@PG     PN:VACmap       ID:VACmap-72996C29      VN:1.0  CL:vacmap -ref /stornext/snfs4/next-gen/scratch/ryan/reference/human-grch38.fasta -read ../rawdata/20230405_GM19240_UL_eee-prom1-2F-PAK69241_guppy-5.0.11-sup-prom_fastq_pass.fastq.gz -mode S --MD -t 8 --rg-id 20230405_GM19240_UL_eee-prom1-2F-PAK69241_guppy-5.0.11-sup-prom_fastq_pass --rg-sm NA19240_ONT_vacmap
@PG     PN:VACmap       ID:VACmap-79BD64BF      VN:1.0  CL:vacmap -ref /stornext/snfs4/next-gen/scratch/ryan/reference/human-grch38.fasta -read ../rawdata/20230405_GM19240_UL_eee-prom1-2G-PAK68964_guppy-5.0.11-sup-prom_fastq_pass.fastq.gz -mode S --MD -t 8 --rg-id 20230405_GM19240_UL_eee-prom1-2G-PAK68964_guppy-5.0.11-sup-prom_fastq_pass --rg-sm NA19240_ONT_vacmap
micahvista commented 2 months ago

Dear Ryan,

Thank you for reporting these issues. Yes, VACmap is currently slower than minimap2. However, I have recently implemented some performance improvements that reduce the running time. Please try the latest version of VACmap, which should be 40% faster than the previous version. I will continue to work on enhancing the speed, as there is still plenty of room for improvement. Regarding issue 2, I have modified the original multiprocessing implementation and am currently testing it. I anticipate updating the code within the next day or two. For issue 3, I have not been able to reproduce the error. Could you please try the latest code and check if the issue persists?

Thank you very much!

Best regards, Hongyu Ding

micahvista commented 2 months ago

Dear Ryan,

I am not sure about issue 2, can you try the latest version of VACmap and check if the issue persists? I fixed a memory issue in the latest version which caused a surge in memory usage.

Thank you very much!

Best Hongyu

jamesc99 commented 2 months ago

Thanks for your quick response and hard work!

I am not sure if you updated VACmap again, I am rerunning my data with the version downloaded around 12 hrs ago. Will update the results.

Kind regards, Ryan

micahvista commented 2 months ago

Current version should be fine,I have uploaded the code three days ago. I am not sure the cause of issue 2, if you find anything please let me know. Thank you!

Best Hongyu

micahvista commented 1 month ago

Dear Ryan,

I wanted to update you on the recent improvements I’ve made to VACmap, particularly regarding output size reduction and runtime optimization.

To help reduce the output file size, I’ve introduced two new options:

--H (Hard-clipping): This option uses hard-clipping instead of soft-clipping for clipped sequences. --Q (Ignore Base Quality): This option ignores base quality in the input file. By using these options, you can expect to reduce the output file size by approximately 2-5 times. However, please note that when using the --H option, there is a potential side effect related to split-read event detection. For instance, Sniffles2, which uses pysam to read BAM files, may encounter issues. Specifically, using hard-clipping in CIGAR strings can produce incorrect query alignment positions, potentially preventing Sniffles2 from accurately inferring the type of structural variants (SVs).(https://github.com/fritzsedlazeck/Sniffles/blob/a4af9926a4ec8278d28ea6d9382b15908ed51488/src/sniffles/leadprov.py#L269)

In addition to the file size reduction options, I’ve also optimized VACmap’s runtime on HiFi data. The latest version is approximately 45% faster than the previous one.

I will continue to work on further improvements, and I appreciate your continued support.

Thank you.

Best regards, Hongyu