Open AndrewC160 opened 2 years ago
Hi, thanks for reaching out. This error sometimes happens when the chromosome naming scheme of the BAM file doesn't match what is expected by AA. For --ref hg19
(default) and hg38
, it assumes chromosomes beginning with "chr1, chr2,...", and for --ref GRCh37
it assumes they are as 1, 2, 3,... X, Y, MT
.
We should have fixed all incompatibilities with Isaac as of AA version 1.1.
Please let me know though if this doesn't resolve the issue! Jens
Hello Jens,
Thanks for the quick response! I was checking the BAM files and they appear to follow the same convention, but it occurs to me that the issue may be the reference genome (and maybe a very obvious mistake on my part): Is it required that alignment be to the provided reference, or should files that were aligned to a standard hg38 work? If the former is the case, then I believe that was my issue.
-Andrew
Hi Andrew,
It isn't required that it be that specific fasta, however I think if the bed file that goes into AA's --bed
argument (or in the amplified_intervals.py
script which is where it crashed) contains any of the following, it could create issues
chrom pos1 pos2 [optional column] CN_estimate
, where the estimated CN is in the last column.If issues continue and you are able/would like to, please also feel free to send over the header of the BAM file and the bed file you are passing into amplified_intervals.py and I can take a quick look!
Thanks and sorry for the troubles with getting it running, Jens
Update #2: Yep, the error was a rookie data cleaning mistake. Thanks for helping me trouble shoot!
Update: Now that I'm actually looking more closely I think the answer is pretty obvious: the folks who provided the files were mistaken, BAM is aligned to hg19. I imagine the first time something falls off of the genome there will be all sorts of errors that manifest in a python script...Apologies! I'm going to try running it with the correct reference now and I'll let you know (and I'll send a quick email to our collaborators :-) )
Hello Jens,
Thank you for looking into it! Attached is the BAM header as well as the BED file. I've used the same formatted BED file for the alignment that was performed with BWA by PrepareAA.py, so I think it should be OK, though for good measure I've done it with and without "chr" appended to the chromosomes. I've also tried filtering out all unmapped reads from the BAM as well as anything mapped to something other than a standard chromosome, but sadly to no avail.
Hi Andrew, glad that you were able to find the cause! Thank you for looking carefully at this. On my to-do list is to add a check to AA which ensures that the description of the reference in the header matches with what the user selected as --ref
since this kind of issue is pretty frequent - I will make it a higher priority now!
Jens
Hello,
In the "Creating the BAM file" section of the guide, it is mentioned that BAM files from Isaac are allowed. Are there any restrictions on this or settings to change? We have a number of Issac-generated BAM files that I would rather not regenerate with BWA, but at the moment I am only able to run PrepareAA.py with BAM files prepared by BWA via the PrepareAA.py script. When using an Isaac BAM files, I've been getting the error:
Traceback (most recent call last): File "/N/u/aclugston/src/AmpliconArchitect/src/amplified_intervals.py", line 108, in
bamFileb2b = b2b.bam_to_breakpoint(bamFile, coverage_stats=cstats)
File "/N/u/aclugston/src/AmpliconArchitect/src/bam_to_breakpoint.py", line 103, in init
self.median_coverage(window_list=coverage_windows)
File "/N/u/aclugston/src/AmpliconArchitect/src/bam_to_breakpoint.py", line 295, in median_coverage