madagiurgiu25 / decoil-pre

Reconstruct ecDNA from long-read data using Decoil tool
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

Discrepancy in ecDNA count between `decoil-pipeline -f sv-reconstruct` and `decoil reconstruct` using BigWig files #19

Open RP-Silk opened 3 weeks ago

RP-Silk commented 3 weeks ago

I successfully processed one of our ONT samples using the Decoil pipeline. However, I've noticed an inconsistency in the output when running the same sample through two different commands:

Question: I'm trying to understand what the decoil-pipeline -f sv-reconstruct pipeline is doing to the BigWig file that isn't happening when running decoil reconstruct with a manually supplied BigWig file. Why are the ecDNA results different between these approaches?

madagiurgiu25 commented 2 weeks ago

Dear @RP-Silk,

Thank you for the question. I would need some more information in order to be able to help. Could you please provide the commands you used to run decoil-pipeline sv-reconstruct and decoil reconstruct with the parameter specification? Which decoil version did you use? Was the docker version? Could you provide how you compute yourself the BigWig? An example would be helpful.

The decoil-pipeline sv-reconstruct computes as follows the BigWig: https://github.com/madagiurgiu25/decoil-pre/blob/0af2e53dfd7229a90568cfcf7cda95b84f899e5e/decoil/cli/rules/sv.smk#L48.

RP-Silk commented 2 weeks ago
  1. The BigWig file was generated as follows using deepTools, however I have normalised using RPKM, which is perhaps where we have differed and is the cause of the differences. Perhaps this isn't the best approach for normalisation?: bamCoverage -b <bam-file> --normalizeUsing RPKM -o <output-bw>

  2. The Decoil pipeline was first run after downloading the pipeline from source as follows: decoil-pipeline -f sv-reconstruct --bam <bam-file> --reference-genome <ref-fasta> --annotation-gtf <anno-gtf> --outputdir <output-dir> --name <sample-id> This produced 12 ecDNAs.

  3. The pipeline was then run with the VCF generated in 2 and the BigWig file generated in 1, but only using reconstruct: decoil reconstruct -b <bam-file> -i <vcf-generated-in-2> -c <bw-generated-in-1> --outputdir <output-dir> --name <sample-id> -r <ref-fasta> --annotation-gtf <anno-gtf> This produced 14 ecDNAs.

  4. Finally, the pipeline was run in the same manner as in step 3, however this time using the the VCF generated in 2 and the BigWig file generated in 2, as follows: decoil reconstruct -b <bam-file> -i <vcf-generated-in-2> -c <bw-generated-in-2> --outputdir <output-dir> --name <sample-id> -r <ref-fasta> --annotation-gtf <anno-gtf> This produced 12 ecDNAs.

Again, it looks like it might be down to the differences in how the BigWig files were generated. Thanks for pointing out how Decoil generates it!