madagiurgiu25 / decoil-pre

Reconstruct ecDNA from long-read data using Decoil tool
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

Input VCF from somatic variant callers #16

Open eesiribloom opened 1 month ago

eesiribloom commented 1 month ago

As briefly mentioned in #14, based on my own variant calling in our cancer dataset and the respective manuscripts, long-read somatic variant callers such as nanomonsv outperform current germline callers like sniffles and cuteSV and call fewer false positives.

I imagine many of decoil users will also be working with cancer data to detect ecDNA so it would be helpful to be able to input VCFs from such callers.

Nanomonsv has an example VCF here. Which is format 4.3

other potential somatic SV callers: SAVANA appears to be VCF file format 4.2 Severus is less clear to me

MJoseMo commented 2 weeks ago

Hi @eesiribloom have you tried nanomonsv output in decoil ?? I'm wondering if you've encountered any issues.

eesiribloom commented 2 weeks ago

@MJoseMo It doesn't seem happy with the VCF output directly from nanomonsv. I get the same WARNING - Corrupted VCF line for all the variants in the VCF file and the output does not detect any ecDNA.

decoil reconstruct \
--bam ${BAM_INPUT} \
--vcf ${VCF_INPUT} \
--coverage ${COV_INPUT} \
--outputdir ${OUTPUT_FOLDER} \
--name ${NAME} \
--reference-genome ${GENOME} \
--annotation-gtf ${ANNO} \
--fragment-min-cov 40 \
--min-cov 40 \
--min-cov-alt 20 \
--min-vaf 0.01 \
--filter-score 40 \
--fragment-min-size 1000 \
--min-sv-len 5000

Set QUAL.MEAN_COVERAGE_WGS to  31.379556649380174
Set QUAL.MINIMAL_FRAGMENT_COVERAGE to  40
2024-09-30 16:22:39,846 - reconstruct.encode - INFO - 0. Start processing 
2024-09-30 16:22:39,846 - reconstruct.encode - INFO - nanomonsv.result.vcf
2024-09-30 16:22:39,846 - reconstruct.encode - INFO - 1. Get all breakpoints and parse SV information
2024-09-30 16:22:39,866 - reconstruct.encode - WARNING - Corrupted VCF line chr10   10389704    d_65    C   <DEL>   .   PASS    END=10389755;SVTYPE=DEL;SVLEN=-51   TR:VR   30:19   68:0
MJoseMo commented 1 week ago

Thanks @eesiribloom, I tried the software with the example data, and it completed quickly. However, have you tested it with any real sample? I ran decoil-pipeline command, and it has been running for over a day without generating any new files. How can I check the progress or what step is frozen?