Get stuck at a random amplicon "#TIME 32219.598 Reconstructing amplicon30"

shaka-emperor commented 3 years ago

Hi sir I ran <AmpliconArchitect/docker/run_aa_docker.sh --bam HBV.bam --bed FREEC.bed --out AA_out> for several times. Unfortunately it always got stuck at 'Reconstructing amplicon', maybe amplicon 30, 130, etc. and no longer progress without error reporting. Once I waited as long as seven days... it really confused me a lot. And by the way, my dataset was generated by Circle-seq, so the reads nearly all came from ecDNA.

virajbdeshpande commented 3 years ago

Hi @shaka-emperor, AA might occassionaly take unrealistically long time if it lands up on an amplicon with too many edges. There can be several reason there can be an amplicon with too many edges:

The amplicon indeed has a lot of rearrangements
If the library prep adds false palindromic rearrangements
If the amplicon contains an interval in a repetitive region which then connects to other repeats in the genome and AA is unable to filter out the edges between these repeats
If the amplicon has very high depth of coverage compared to the rest of the genome.

In this situation, if I were to make an educated guess it is a combination of 3 and 4. In case you are looking for chimeric amplicons containing viral-human DNA. In that case, I would suggest providing the BED file containing a single intervals which corresponds to the viral contig in the BAM file. (You can use samtools -H BAMFILE and identify the contig name and length of the virus). Make sure you have mapped the reads to a combined human-viral reference consisting of only the strain of the virus which is present in the sample.

In case you are looking for ecDNA not specific to viral genome, AA is designed to reconstruct the structure of larger ecDNA which we have observed to be specific to cancer samples. Circle-seq seems to be good at enriching very small ecDNA. In theory AA should be able to reconstruct smaller ecDNA. Since there can be a large number of very small ecDNA, some of these may land in repetitive regions. Further the CNV calling tool might generate false amplification callls in repetitive regions. AA does not have the choice to filter out repeat regions and if forced to analyze them if they are included in the BED file. You may filter out the regions using the amplified_intervals.py script available in the git repo (please see the README). In case you are interested in reconstructing smaller ecDNA (perhaps only those with very high copy number), you can customize the size and relative copy number thresholds when running amplified_intervals.py.

Finally make sure you are using the latest version of AA from the docker/github.

shaka-emperor commented 3 years ago

Thank you very much for this detailed guide~~~ I will keep trying AA

virajbdeshpande / AmpliconArchitect

Get stuck at a random amplicon "#TIME 32219.598 Reconstructing amplicon30" #85