virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
131 stars 41 forks source link

Feature request: force AA to use specific breakpoints #134

Closed willhooper closed 10 months ago

willhooper commented 1 year ago

Hi,

Thanks for developing this great tool. Would it be possible to add an argument to AmpliconArchitect to force it to use a specific set of SV breakpoints? I know there's an option to include the data in the output directory, but we've had some trouble getting this to work in modes other than --runmode=CLUSTERED.

Essentially, our use case is that we have a set of calls from another tool, with candidate intervals and breakpoints. We would like to use these, without the constraint that all the input intervals be included in the same amplicon. For example, we believe the below should be ecDNA, but it's classified as a linear amplification because of the flanking intervals

Thanks! Will

image

jluebeck commented 1 year ago

Hi Will,

Thanks for reaching out with this request! Could you please provide a bit of information on how the input bed file that went to AA was generated here? Was AmpliconSuite used to wrap the process? What was the specific command that was used for this case?

It looks like there is a gap between the first two sections of the AA diagram. If the intervals provided are not complete and run mode EXPLORE is not used, there may be issues with bridging the gap.

We can certainly look into adding the requested features, however at present we continue to recommend sticking to AmpliconSuite in order to ensure current best practices are used when running the tool. I agree it would be useful to give AA a "jump-start" with some externally generated SVs. Restricting individual AA amplicons to specific sets of intervals may be possible too, but we much prefer that AA is allowed to explore the connections those intervals make to the rest of the genome otherwise we risk creating incomplete amplicons.

Thanks, Jens

willhooper commented 1 year ago

Thanks for the quick response! Briefly, we use another graph-based approach called JaBbA (https://github.com/mskilab/JaBbA) to call higher-order structural variants. For each complex structural variant call (e.g. a double-minute), we have a list of intervals and breakpoints comprising the event. The segmentation is SV breakpoint-informed, and our breakpoints are generated from a consensus of 4 callers (manta, svaba, lumpy, and gridss2).

We'd like to use AA to add confidence that our amplification calls are ecDNA, analogous to passing a short variant caller a list of predefined loci to genotype.

jluebeck commented 1 year ago

Hi Will,

That sounds great. I am indeed familiar with JaBbA and it is a very nice tool. After discussing with my adviser, we are happy to engage further on this.

If possible, some additional details on the command being used to invoke AA would be helpful for me as I figure out what changes must be made to support this. Any chance you can provide a testing dataset (bam file + AA-formatted input files from JaBbA)?

Regarding the image you posted originally, it is hard for me to say for certain without seeing the input data, but my strong feeling is that it is called linear not because of the flat flanking regions on either side, but because of the gap between the circled interval endpoints: image

For each individual ecDNA you would like to confirm, making a call to AA giving the regions and breakpoints would be the way to go (as opposed to giving it intervals and breakpoints for all ecDNAs at the same time). It would also be prudent to give some flanking regions on left and right of the candidate ecDNA intervals so that AA can see the surrounding context.

This will probably require a bit of back-and-forth as we figure out how to make this integration work properly, so please feel free to email additional questions: jluebeck [a] ucsd.edu. Before this kind of feature is released, we would like to see a bit of testing data to ensure that the usage of AA in this context is reasonable since this is a highly customized way of using AmpliconArchitect and not one that matches with the way we would normally suggest for users - nor is it a usage we had in mind when designing AA. Particularly, if you have additional 'problem cases' like above that would be great for us to look into. Thanks much for understanding.

Jens

jluebeck commented 10 months ago

Support for externally-provided SV calls (VCF format) is added to AA v1.3.r6 available here: https://github.com/AmpliconSuite/AmpliconArchitect

willhooper commented 10 months ago

Amazing, thank you for adding this feature!