virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
131 stars 41 forks source link

DataRepo #119

Open al3xMlt opened 2 years ago

al3xMlt commented 2 years ago

Hello, thanks for AA! It works class! I was wondering if I can use my own fasta file (used to generate the bam)? Maybe you can give a brief explanation about the data_repo folder structure. Should I modify the dummy_ploidy file if I have an aneuploid sample?

Best

Alex

jluebeck commented 2 years ago

Hi Alex,

You can use your own fasta file to generate the BAM, without any modifications to the AA data_repo, provided it is some version of hg19, GRCh37, GRCh38/hg38, or mm10 (mouse). If it is something else - a genome build not listed or a different species, then the process for generating the data repo becomes much more complicated as there are many 3rd-party annotation files that must be collected. We have found that those annotations are not always available for certain species/builds. Note that if your fasta uses accession numbers instead of chromosome names (e.g. CM000663.2 instead of chr1), that will cause a problem.

The dummy_ploidy file is only used if you are using PrepareAA with the Canvas CNV caller option to generate seed regions for AA. If you are starting with a BAM file, I recommend PrepareAA with the CNVKit caller option to identify seeds regions.

I'll note, since it comes up occasionally, if you align to a fasta with viral sequences and want AA to explore them, please add the viral sequence(s) of interest to the AA seeds file (*_AA_CNV_SEEDS.bed if using PrepareAA) before running AA.

Thanks and please let me know if any additional issues or questions arise! Jens

panxiaoguang commented 2 years ago

Hi Alex,

You can use your own fasta file to generate the BAM, without any modifications to the AA data_repo, provided it is some version of hg19, GRCh37, GRCh38/hg38, or mm10 (mouse). If it is something else - a genome build not listed or a different species, then the process for generating the data repo becomes much more complicated as there are many 3rd-party annotation files that must be collected. We have found that those annotations are not always available for certain species/builds. Note that if your fasta uses accession numbers instead of chromosome names (e.g. CM000663.2 instead of chr1), that will cause a problem.

The dummy_ploidy file is only used if you are using PrepareAA with the Canvas CNV caller option to generate seed regions for AA. If you are starting with a BAM file, I recommend PrepareAA with the CNVKit caller option to identify seeds regions.

I'll note, since it comes up occasionally, if you align to a fasta with viral sequences and want AA to explore them, please add the viral sequence(s) of interest to the AA seeds file (*_AA_CNV_SEEDS.bed if using PrepareAA) before running AA.

Thanks and please let me know if any additional issues or questions arise! Jens

Hi, jluebeck, you mean if I want to run AA in viral mode, I should first align my fastq file to a modified genome with human and viral genome, and then add the viral genome interval to *_AA_CNV_SEEDS.bed? I even do not need to change the genome in data_repo before running AA?

jluebeck commented 2 years ago

Hi Alex, that's correct!