Open lwlive opened 2 years ago
Hi,
(1) Some very complex samples may take a long time to finish (possibly up to 2 weeks), however that is very rare. Typically reasons for long runtimes involve how the seed regions (--bed
file) were selected, or the presence of artifact discordant reads arising from poorly controlled insert size distribution during library prep.
Some questions to help determine if the seed file is the issue: What CNV-caller, copy number and size cutoff were used for your bed file? How many distinct intervals and how much of the genome are present? Did the bed file undergo filtering using amplified_intervals.py?
You may consider using the PrepareAA wrapper to standardize the selection of seed regions using our current best practices.
(2) Assuming you are starting with a bam file aligned to a reference that included HPV, then you can simply add the viral genome to your seed --bed
file. For example, if your viral genome had the name hpv16ref_1
, then you would add the following entry to your bed file before running AA:
hpv16ref_1 1 7906
Please let me know if you have other questions or run into other issues! Jens
Hi,jluebeck
Thanks for your kind and quick response, and sorry for my late response. Your answers give me some idea.
(1) The long running time may be caused by the bed file. I have produced a bed file(genome.bed) for the hgh38 and hpv genome, split it to a short regions( python3 /cnvkit.py target --split --avg-size 5000 -o genome.splited.bed ), produced a .cnn file for runing cnvkit. Then I have get the gained CNV regions with cnvkit( cnvkit.py batch ) and filter with amplified_intervals.py (--gain 4 --cnsize_min 10000 ). I have refered the PrepareAA wrapper. I cound not use the PrepareAA because I can not produce a directory of AA_DATA_REPO with hpv in it.
Should I filter the genome.bed file and remove the centromeres? And what had been remove in *_cnvkit_filtered_ref.cnn?
(2)Maybe I add the "hpv16ref_1 1 7906" in the GRCh38_cnvkit_filtered_ref.cnn and the run cnvkit and AA ?
Your responses do help me! Thanks again!
Hi,
file_list.txt
, update the file names for the reference fasta and .fai. Releasing a viral version of the data repo is on my to-do list but I haven't gotten to it just yet, apologies. I would also recommend using --cngain 4.5
, since there are possible false-positive amplification areas between 4 and 4.5. You can of course try both and compare results.Thanks, Jens
Hi, Thanks for your kind response! I will follow yours suggestions! Best wishes! Wei Liu
Hi,
I have run the AmpliconArchitect with following command "python2 AmpliconArchitect.py --ref None --downsample 10 --bed WGS000100002tumor_AACNVbed.bed --bam WGS000100002tumor.dup.realign.bam --runmode FULL --out WGS000100002tumor.Amplicon", but I have encountered some problems, could you please give me a hand. (1)although i have used "downsample" to reduce the data, but it still run one week and did not finish yet. is there some thing wrong with my commands? (2) I am wondering how to build a AA_DATA_REPO data set because I have to add a HPV. waiting for your kind response! Thank you!