Closed jamesc99 closed 1 month ago
Dear Ryan,
Thank you for reporting these issues. Yes, VACmap is currently slower than minimap2. However, I have recently implemented some performance improvements that reduce the running time. Please try the latest version of VACmap, which should be 40% faster than the previous version. I will continue to work on enhancing the speed, as there is still plenty of room for improvement. Regarding issue 2, I have modified the original multiprocessing implementation and am currently testing it. I anticipate updating the code within the next day or two. For issue 3, I have not been able to reproduce the error. Could you please try the latest code and check if the issue persists?
Thank you very much!
Best regards, Hongyu Ding
Dear Ryan,
I am not sure about issue 2, can you try the latest version of VACmap and check if the issue persists? I fixed a memory issue in the latest version which caused a surge in memory usage.
Thank you very much!
Best Hongyu
Thanks for your quick response and hard work!
I am not sure if you updated VACmap again, I am rerunning my data with the version downloaded around 12 hrs ago. Will update the results.
Kind regards, Ryan
Current version should be fine,I have uploaded the code three days ago. I am not sure the cause of issue 2, if you find anything please let me know. Thank you!
Best Hongyu
Dear Ryan,
I wanted to update you on the recent improvements I’ve made to VACmap, particularly regarding output size reduction and runtime optimization.
To help reduce the output file size, I’ve introduced two new options:
--H (Hard-clipping): This option uses hard-clipping instead of soft-clipping for clipped sequences. --Q (Ignore Base Quality): This option ignores base quality in the input file. By using these options, you can expect to reduce the output file size by approximately 2-5 times. However, please note that when using the --H option, there is a potential side effect related to split-read event detection. For instance, Sniffles2, which uses pysam to read BAM files, may encounter issues. Specifically, using hard-clipping in CIGAR strings can produce incorrect query alignment positions, potentially preventing Sniffles2 from accurately inferring the type of structural variants (SVs).(https://github.com/fritzsedlazeck/Sniffles/blob/a4af9926a4ec8278d28ea6d9382b15908ed51488/src/sniffles/leadprov.py#L269)
In addition to the file size reduction options, I’ve also optimized VACmap’s runtime on HiFi data. The latest version is approximately 45% faster than the previous one.
I will continue to work on further improvements, and I appreciate your continued support.
Thank you.
Best regards, Hongyu
Hi there,
I have been using VACmap for weeks and it performs pretty well in detecting complex SVs. However, I continuously encountered several issue and I hope you can fix it to help VACmap better!
1. Issue 1: enormous size for alignment file and long mapping time It usually took a couple of times more running time for VACmap than minimap2 on the same data (2-3 times more in general), and generated extremely large intermediate and alignment files (SAM and BAM). (like 166 fastq.gz file to generate 3T SAM file) This issue is understandable as I know you add a non-linear step in VACmap to help split reads. But this may relate to the second issue I list below.
2. Issue2: possible 'multiprocessing' module issue during mapping (urgent issue) this continuously happened to me when I was trying to align some high-coverage LR data. typical error:
I have already set
ulimit -n 4096
and increase mem to 128 gb (though I think mem is not the issue), but are still having this problem.3. Issue3: failed to add RG tag to BAM file (for pbsv calling) As my question in #3, I run the lastest version of VACmap with
--rg-id
and--rg-sm
option.vacmap -ref ${ref38} -read ${fastqfile} -mode S --MD -t 8 --rg-id ${rg_id} --rg-sm ${rg_sm} > ${fastq_basename}.sam
however, when I checked the header of BAM file by samtools view -H, there is no RG header in it.