Open feihongloveworld opened 6 years ago
Hello,
Thanks for reporting the issue. The amplified_intervals.py script assumes the bed file is the format generated by the tool ReadDepth where the expected copy number is reported in column number 5. The bed file you are using contains the copy number if column.
As a result, you will need to add a column to your bed file, the value in column 4 is not really used, so it does not matter what value to enter.
So you can transform your bed file to the following: 12 66449999 66452000 1 8 18 19847999 20060000 1 13 18 20059999 20239000 1 11 18 20238999 20243000 1 8 18 20242999 20765000 1 8 20 33345999 33436000 1 6
hi sir: dose the “amplified_intervals.py” script only work on hg19? so chr column of my cnv.bed do not have string of "chr" will be error ?
The script should work without the 'chr' string, but there is a catch.
It filters out blacklisted intervals provided in the data_repo. I have not yet added the --ref option to this script, so it defaults the hg19 reference and will not be able to blacklist regions.
In your case, you are only looking at somatic amplifications which helps filter out the blacklisted regions. The example bed file you provided above does not contain any blacklisted regions, so the absence of the reference annotation will cause a problem.
Hi @virajbdeshpande , is it possible to provide an example bed file in the repo - to demonstrate what is needed for AA to run properly? A short description of how to generate such as BED file will also be super helpful!
Another related question is for the BED file, column 5 - expected copy number: my samples don't have many high copy number regions, the highest copy number region only has 3. Will this be enough for AA tu run? - also the regions/segment are provided from facets call, which can be a long interval.
Thanks
Hello Isaac,
That is a good suggestion. I can try to include an example as a tutorial for running AA. The bed file is essentially just a list of seed regions. Since you only have 3 regions, you may try to directly use these as the input bed file to AA. I would recommend checking that the entire chromosome or arm of the chromosome is not aneuploid.
Alternatively, I have included a script amplified_intervals.py
(see README and try the --help
option) which can perform additional processing but requires the BED file to have the CN listed in the 5th column. Since you are targetting regions of low amplification (CN >= 3), you can run amplified_intervals.py
with the option --gain 3
.
I have tested AA on seed intervals with CN >= 5 and size > 100kbp up to ~10Mbp. AA can still run on CN = 3, however, the accuracy may vary from sample to sample. For long seed intervals, I would recommend downsampling the BAM file to 5x-10x coverage.
hi sir: i got a error . can you help me?
----------------------------
head WGS_A_D_5715_T.somatic.CNV.txt.bed
chr start end cn
12 66449999 66452000 8 18 19847999 20060000 13 18 20059999 20239000 11 18 20238999 20243000 8 18 20242999 20765000 8 20 33345999 33436000 6
----------------------------
python amplified_intervals.py --bed WGS_A_D_5715_T.somatic.CNV.txt.bed Traceback (most recent call last): File "amplified_intervals.py", line 58, in
rdList = hg.interval_list([r for r in rdList0 if float(r.info[1]) > GAIN ])
IndexError: list index out of range