virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
135 stars 43 forks source link

questions about --extendmode VIRAL #79

Closed shaka-emperor closed 4 years ago

shaka-emperor commented 4 years ago

Hi, virajbdeshpande I noticed that in 'help' documentation, --extendmode string can be set as 'VIRAL' but with no explanation. So I was wondering does "VIRAL" mean something? And I also read your paper "Exploring the landscape of focal amplifications in cancer using AmpliconArchitect" published in Nat. Commun. In this paper, chimeric reference genome was created by combining human genome with virus genome. I am not sure which genome should I use in $AA --ref --bam --bed --out. hg19full.fa? or the chimeric reference genome containing viral sequence? Many thanks

virajbdeshpande commented 4 years ago

Hello @shaka-emperor ,

For aligning, provide the chimeric reference to the aligner. For running AA, use the reference provided in the data_repo (not the chimeric reference). The input bed should simply be the contig corresponding to the viral genome and --extendmode should be "VIRAL".

--extendmode VIRAL looks for all intervals connected to the virus. It reports the sites of integration in new file called .integration_search.out. Since it is common to have the viral copy number to be lot larger than individual human intervals, in the mode, AA relaxes the constraint on read support of edges connecting the viral genome to the human genome. Further, since all human intervals are connected to the viral genome, they are reported in a single amplicon with amplicon ID 0. Next, AA also generates plots for individual human + the viral genome which are labelled amplicon ID 1, 2, 3 and so on.

shaka-emperor commented 4 years ago

Thanks a lot~~