virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
131 stars 42 forks source link

Get stuck at a random amplicon "#TIME 32219.598 Reconstructing amplicon30" #85

Closed shaka-emperor closed 3 years ago

shaka-emperor commented 3 years ago

Hi sir I ran <AmpliconArchitect/docker/run_aa_docker.sh --bam HBV.bam --bed FREEC.bed --out AA_out> for several times. Unfortunately it always got stuck at 'Reconstructing amplicon', maybe amplicon 30, 130, etc. and no longer progress without error reporting. Once I waited as long as seven days... it really confused me a lot. And by the way, my dataset was generated by Circle-seq, so the reads nearly all came from ecDNA.

virajbdeshpande commented 3 years ago

Hi @shaka-emperor, AA might occassionaly take unrealistically long time if it lands up on an amplicon with too many edges. There can be several reason there can be an amplicon with too many edges:

In this situation, if I were to make an educated guess it is a combination of 3 and 4. In case you are looking for chimeric amplicons containing viral-human DNA. In that case, I would suggest providing the BED file containing a single intervals which corresponds to the viral contig in the BAM file. (You can use samtools -H BAMFILE and identify the contig name and length of the virus). Make sure you have mapped the reads to a combined human-viral reference consisting of only the strain of the virus which is present in the sample.

In case you are looking for ecDNA not specific to viral genome, AA is designed to reconstruct the structure of larger ecDNA which we have observed to be specific to cancer samples. Circle-seq seems to be good at enriching very small ecDNA. In theory AA should be able to reconstruct smaller ecDNA. Since there can be a large number of very small ecDNA, some of these may land in repetitive regions. Further the CNV calling tool might generate false amplification callls in repetitive regions. AA does not have the choice to filter out repeat regions and if forced to analyze them if they are included in the BED file. You may filter out the regions using the amplified_intervals.py script available in the git repo (please see the README). In case you are interested in reconstructing smaller ecDNA (perhaps only those with very high copy number), you can customize the size and relative copy number thresholds when running amplified_intervals.py.

Finally make sure you are using the latest version of AA from the docker/github.

shaka-emperor commented 3 years ago

Thank you very much for this detailed guide~~~ I will keep trying AA