virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
135 stars 43 forks source link

output files #132

Open jennyp76 opened 1 year ago

jennyp76 commented 1 year ago

Hi, I've ran PrePareAA with tumor.sorted.bam file (WGS) with the code below and no error message shown in the log file.

python3 ~~~~~/PrepareAA.py \
        --output_directory ${AnalysisPath}/test \
        --nthreads 12 \
        --sample_name ${SAMPLE} \
        --cnvkit_dir ~~~~/3.8.1/bin/cnvkit.py \
        --bam ${INPUTPath}/${SAMPLE}.RGadded.marked.fixed.bam \
        --run_AA \
        --python3_path ~~~~~/python/3.8.1/bin/python3 \
        --ref mm10 \
        --aa_python_interpreter ~~~~~/python/3.8.1/bin/python3

But, I did get empty .out files for all chromosomes and cnseg_txt files were exactly the same (False False) for all interval as shown below chr2:178313966-179419627.out --> #chr2_178313966_179419627_cnseg.txt chr2 178313966 179419627 2.028693672800463 False False chr4:119277947-119814993.out chr7:35441461-35649219.out chr11:90040217-90112738.out chr13:90210907-90297149.out chr14:49483083-49900565.out

Also, since our data is brain tumor, our team expected to detect EGFR amplication in our data, but AA gave bearly any ampliciation.. am I running AA properly? I attached the summary file and log file just in case needed.

I wonder what causes the empty files and if it's okay to keep on going with such results??

log_file.log summary.txt

jluebeck commented 1 year ago

Hi,

Thanks for reaching out about these results. The .out files will often be empty. They are primarily an intermediate file. It appears you are completely fine to continue with the results. If you would like to share the graph.txt, cycles.txt, *.png files with me I am happy to look them over to ensure everything appears to be in order with the outputs that matter most. You can contact me at jluebeck [at] eng.ucsd.edu.

One other check you can do is to take a look at the CNVKit results for the mouse Egfr locus, and see if there is any evidence of strong focal amplification. If this mouse model uses, for instance, a transgene version of human EGFR to induce the brain tumor, then this would not show up properly when using AA. Alternatively, if there is a very high degree of tumor impurity or heterogeneity, then this may also cause focal amplifications to not appear strongly.

Thanks, Jens

jennyp76 commented 1 year ago

Thank you for your reply jlue.

It's so good to hear that the files appear to be completely fine :) Just one thing. Could you explain a bit more about why transgene version of human EGFR would not show up properly when using AA.?

Thank you in advance, Jen

jluebeck commented 1 year ago

Hi Jen,

Thank you for the excellent question. When using AA with mouse genome mm10 set as the reference, AA will assume the mouse tumor contains only mouse DNA. If a human transgene is present, then depending on how the sequence alignment step is done, AA will either not know there is a transgene copy of human EGFR during the copy number calling step, or if the transgene was not present during the sequence alignment stage, those human EGFR reads will be attempted to map somewhere else in the mouse genome (possibly to the homologous version of the mouse Egfr gene). AA is not inherently designed to handle tumors that include a transgene from a different species.

Thank you, Jens