Discrepancy in ecDNA count between `decoil-pipeline -f sv-reconstruct` and `decoil reconstruct` using BigWig files

madagiurgiu25 / decoil-pre

Reconstruct ecDNA from long-read data using Decoil tool

BSD 3-Clause "New" or "Revised" License

8 stars 0 forks source link

Discrepancy in ecDNA count between `decoil-pipeline -f sv-reconstruct` and `decoil reconstruct` using BigWig files #19

Open RP-Silk opened 3 weeks ago

RP-Silk commented 3 weeks ago

I successfully processed one of our ONT samples using the Decoil pipeline. However, I've noticed an inconsistency in the output when running the same sample through two different commands:

When using decoil-pipeline -f sv-reconstruct, the summary.txt file reports a different number of ecDNA compared to running decoil reconstruct with a supplied BigWig file, despite running it with the clean.vcf generated using decoil-pipeline -f sv-reconstruct.
Interestingly, I can achieve the same ecDNA results from decoil reconstruct as the decoil-pipeline -f sv-reconstruct pipeline if I use the BigWig file generated by the pipeline itself (coverage.bw), instead of one that already exists.

Question: I'm trying to understand what the decoil-pipeline -f sv-reconstruct pipeline is doing to the BigWig file that isn't happening when running decoil reconstruct with a manually supplied BigWig file. Why are the ecDNA results different between these approaches?

madagiurgiu25 commented 2 weeks ago

Dear @RP-Silk,

Thank you for the question. I would need some more information in order to be able to help. Could you please provide the commands you used to run decoil-pipeline sv-reconstruct and decoil reconstruct with the parameter specification? Which decoil version did you use? Was the docker version? Could you provide how you compute yourself the BigWig? An example would be helpful.

The decoil-pipeline sv-reconstruct computes as follows the BigWig: https://github.com/madagiurgiu25/decoil-pre/blob/0af2e53dfd7229a90568cfcf7cda95b84f899e5e/decoil/cli/rules/sv.smk#L48.

RP-Silk commented 2 weeks ago

The BigWig file was generated as follows using deepTools, however I have normalised using RPKM, which is perhaps where we have differed and is the cause of the differences. Perhaps this isn't the best approach for normalisation?: bamCoverage -b <bam-file> --normalizeUsing RPKM -o <output-bw>
The Decoil pipeline was first run after downloading the pipeline from source as follows: decoil-pipeline -f sv-reconstruct --bam <bam-file> --reference-genome <ref-fasta> --annotation-gtf <anno-gtf> --outputdir <output-dir> --name <sample-id> This produced 12 ecDNAs.
The pipeline was then run with the VCF generated in 2 and the BigWig file generated in 1, but only using reconstruct: decoil reconstruct -b <bam-file> -i <vcf-generated-in-2> -c <bw-generated-in-1> --outputdir <output-dir> --name <sample-id> -r <ref-fasta> --annotation-gtf <anno-gtf> This produced 14 ecDNAs.
Finally, the pipeline was run in the same manner as in step 3, however this time using the the VCF generated in 2 and the BigWig file generated in 2, as follows: decoil reconstruct -b <bam-file> -i <vcf-generated-in-2> -c <bw-generated-in-2> --outputdir <output-dir> --name <sample-id> -r <ref-fasta> --annotation-gtf <anno-gtf> This produced 12 ecDNAs.

Again, it looks like it might be down to the differences in how the BigWig files were generated. Thanks for pointing out how Decoil generates it!