virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
131 stars 41 forks source link

The bed file for AA is 0 size after running PrepareAA #88

Open tanzhengtang opened 3 years ago

tanzhengtang commented 3 years ago

Hello,I meet a question about prepare the bed file for AA.

my command: python /home/tang/tools/PrepareAA/PrepareAA.py -s A549 -t 4 --cnvkit_dir /home/tang/tools/cnvkit/cnvkit.py --sorted_bam /home/tang/A549/align_data/low_depth_A549_hg19_align_sort_rmdup.bam --ref hg19 -o ./ --python3_path /home/tang/miniconda3/envs/ecDNA/bin/python

the file of process: process.txt

the output of low_depth_A549_hg19_align_sort_rmdup_CNV_GAIN.bed: chr5 17515656 17600656 CNVkit 11.8918395674 chr8 86556450 86841451 CNVkit 10.8396028955 chr15 21885000 21940357 CNVkit 6.31892875621 chr15 22297017 22591206 CNVkit 6.0158141596 chr17 21301608 21361608 CNVkit 5.59799423044 chr17 21506608 21686654 CNVkit 5.226082864 chr19 48403231 48463235 CNVkit 6.90307902477 chr19 50593385 50643388 CNVkit 7.40679804001

Although it produce A549_AA_CNV_SEEDS.bed,but it contain nothing.Is this right?Or what should I do to reslove this?

Thanks very much!

jluebeck commented 3 years ago

Hi,

The regions identified by CNVkit are filtered by the AA script 'amplified_intervals.py', to remove regions that are too small to be candidate AA amplicons, too low in CN, or most importantly, those with large amounts of repetitive sequence content. This filtering steps are important to remove potential false positive regions, as even a normal genome will often show areas of sharp CN increase when aligned back to the reference, for reasons related to the reference genome and the mapping of the reads.

Best, Jens

tanzhengtang commented 3 years ago

Hi,

The regions identified by CNVkit are filtered by the AA script 'amplified_intervals.py', to remove regions that are too small to be candidate AA amplicons, too low in CN, or most importantly, those with large amounts of repetitive sequence content. This filtering steps are important to remove potential false positive regions, as even a normal genome will often show areas of sharp CN increase when aligned back to the reference, for reasons related to the reference genome and the mapping of the reads.

Best, Jens

Ok,I get it.Thanks again!