Open eesiribloom opened 5 months ago
Dear @eesiribloom,
I understand your point about not downsampling, however, currently, all similar methods do internally downsampling. Additionally, downsampling to 5-10X will be equivalent with setting all the thresholds higher.
To check the description of the parameters please visit https://github.com/madagiurgiu25/decoil-pre/blob/main/docs/decoil_reconstruct.md. --min-cov
represents the minimal read depth DP
(from the VCF file), whereas --fragment-min-cov
represents the minimal mean coverage spanning a particular genome fragment. Yes, --min-cov-alt
represents the reads spanning the junction.
I think in your case you should try to run your data using a setup like: --fragment-min-cov 40 --min-cov 40 --min-cov-alt 20 --min-vaf 0.01 --filter-score 40 --fragment-min-size 1000 --min-sv-len 5000
. I would suggest to use the installation from source version to have access to the latest Decoil version.
Let me know if this works for you better.
thanks I will try. What do you think about using variant calls from a somatic SV caller such as nanomonsv which seem to outperform germline callers in cancer data
Dear @eesiribloom,
Thank you for the suggestion. If you have a VCF file from nanomonsv you could try to run decoil (installed from source), but it depends if the VCF is standardized:
decoil reconstruct -b <bamfile> -i <vcffile> -c <coveragefile> --outputdir <outputdir> --name <sample> -r <reference_genome> --fragment-min-cov 40 --min-cov 40 --min-cov-alt 20 --min-vaf 0.01 --filter-score 40 --fragment-min-size 1000 --min-sv-len 5000
Install from source decoil:
git clone https://github.com/madagiurgiu25/decoil-pre.git
cd decoil-pre
# create conda environment
mamba env create -f environment.yml
conda activate envdecoil
python -m pip install -r requirements.txt
python setup.py build install
# add decoil in $PATH
ROOT=`dirname $(which decoil)`
export PATH=$PATH:$ROOT
If you have a VCF file example from nanomonsv please send me and I can adjust the code to make Decoil compatible with the nanomonsv VCF output.
Dear @madagiurgiu25, I would like to ask if it is necessary to downsample the bam file of example, I've seen that its coberture is ~138
samtools depth -a map.bam | awk '{sum+=$3} END {print "Total coverage: ",sum/NR}'
Total coverage: 0.038994
samtools depth map.bam | awk '{sum+=$3} END {print "Total coverage: ",sum/NR}' Total coverage: 137.789
The map.bam
is an example and contains simulated data, i.e. reads cover only the simulated ecDNA structure. There is no WGS coverage. The flag -a
in samtools depth keeps all bins, inclusively those which are 0X covered. And this is how you also compute correct the WGS coverage. Without -a
flag, the bins with are 0X covered will be ignored for the calculation. This means the amplified region is 137X, but this is very small. The WGS mean coverage is 0.03X. The map.bam
does not require downsampling, this is a very small example.
Similar to #11 I'd like to make my decoil ecDNA calls more conservative. But most of my samples have ~25-30x and I don't want to downsample them and loose information unnecessarily.
With default parameters decoil is calling ecDNA in all my samples, which is unlikely. Visually inspecting these regions confirms that some are high-confidence and "look" like ecDNA and others do not have strong evidence.
I have tried increasing
--min-vaf
to 0.1 which did reduce the number of ecDNA slightlyGiven your samples were ~5-10x how should I adjust the parameters accordingly?
What is the difference between
--min-cov
and--fragment-min-cov
?does
--min-cov-alt
refer to the number of reads spanning junctions ?By default does --filter-score remove cycles with estimated copy number /proportion < WGS coverage and should I consider increasing this number ?