virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
135 stars 43 forks source link

BAM files from Isaac aligner #113

Open AndrewC160 opened 2 years ago

AndrewC160 commented 2 years ago

Hello,

In the "Creating the BAM file" section of the guide, it is mentioned that BAM files from Isaac are allowed. Are there any restrictions on this or settings to change? We have a number of Issac-generated BAM files that I would rather not regenerate with BWA, but at the moment I am only able to run PrepareAA.py with BAM files prepared by BWA via the PrepareAA.py script. When using an Isaac BAM files, I've been getting the error:

Traceback (most recent call last): File "/N/u/aclugston/src/AmpliconArchitect/src/amplified_intervals.py", line 108, in bamFileb2b = b2b.bam_to_breakpoint(bamFile, coverage_stats=cstats) File "/N/u/aclugston/src/AmpliconArchitect/src/bam_to_breakpoint.py", line 103, in init self.median_coverage(window_list=coverage_windows) File "/N/u/aclugston/src/AmpliconArchitect/src/bam_to_breakpoint.py", line 295, in median_coverage

jluebeck commented 2 years ago

Hi, thanks for reaching out. This error sometimes happens when the chromosome naming scheme of the BAM file doesn't match what is expected by AA. For --ref hg19 (default) and hg38, it assumes chromosomes beginning with "chr1, chr2,...", and for --ref GRCh37 it assumes they are as 1, 2, 3,... X, Y, MT.

We should have fixed all incompatibilities with Isaac as of AA version 1.1.

Please let me know though if this doesn't resolve the issue! Jens

AndrewC160 commented 2 years ago

Hello Jens,

Thanks for the quick response! I was checking the BAM files and they appear to follow the same convention, but it occurs to me that the issue may be the reference genome (and maybe a very obvious mistake on my part): Is it required that alignment be to the provided reference, or should files that were aligned to a standard hg38 work? If the former is the case, then I believe that was my issue.

-Andrew

jluebeck commented 2 years ago

Hi Andrew,

It isn't required that it be that specific fasta, however I think if the bed file that goes into AA's --bed argument (or in the amplified_intervals.py script which is where it crashed) contains any of the following, it could create issues

If issues continue and you are able/would like to, please also feel free to send over the header of the BAM file and the bed file you are passing into amplified_intervals.py and I can take a quick look!

Thanks and sorry for the troubles with getting it running, Jens

AndrewC160 commented 2 years ago

Update #2: Yep, the error was a rookie data cleaning mistake. Thanks for helping me trouble shoot!

Update: Now that I'm actually looking more closely I think the answer is pretty obvious: the folks who provided the files were mistaken, BAM is aligned to hg19. I imagine the first time something falls off of the genome there will be all sorts of errors that manifest in a python script...Apologies! I'm going to try running it with the correct reference now and I'll let you know (and I'll send a quick email to our collaborators :-) )

Hello Jens,

Thank you for looking into it! Attached is the BAM header as well as the BED file. I've used the same formatted BED file for the alignment that was performed with BWA by PrepareAA.py, so I think it should be OK, though for good measure I've done it with and without "chr" appended to the chromosomes. I've also tried filtering out all unmapped reads from the BAM as well as anything mapped to something other than a standard chromosome, but sadly to no avail.

bam_header.txt PSS131R_cnv_calls_formatted.txt

jluebeck commented 2 years ago

Hi Andrew, glad that you were able to find the cause! Thank you for looking carefully at this. On my to-do list is to add a check to AA which ensures that the description of the reference in the header matches with what the user selected as --ref since this kind of issue is pretty frequent - I will make it a higher priority now!

Jens