Run error - Githubissues

trshyd commented 5 years ago

I have aligned a no-guide sample and a Cas9 targeted sample to hg38 using BBMap. Both of which went through MRE11 pull-down. I have sorted and indexed the bam files with samtools.

When running blender.pl I get the following errors:

[E::fai_build3_core] Failed to open the file chr1:883406-883407 [faidx] Could not build fai index chr1:883406-883407.fai [E::fai_build3_core] Failed to open the file chr1:883416-883417 [faidx] Could not build fai index chr1:883416-883417.fai [E::fai_build3_core] Failed to open the file chr1:883661-883662

And they just keep growing and growing; essentially flagging every single entry in the bam file.

Is this an artifact of using an aligner other than bowtie2? Any recommendations?

My chromosome names do follow the nomenclature of ">chr1" ">chr2" etc (where some references just default to ">1" ">2" etc with no "chr" text.

Hopefully a small format change can solve this, and not a complete re-align.

I have not yet let blender.pl complete its run fully. Although there is a fresh attempt running right now. I am saving the standard error to a separate file, which I'm happy to upload when the entire script-run is done; if that's useful.

I cannot yet comment on whether any positive results show up in the output text file, as I have not yet let it complete even chr1, let alone the entire bam file.

trshyd commented 5 years ago

An extra bit of information, I see from the blender.pl script that it learns the genome location from the @PG line of the bam header. Here is my @PG line from a BBMap produced BAM file:

@PG ID:BBMap PN:BBMap VN:38.26 CL:java -ea -Xmx300g align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx300g 32bit=t sortscafs=f covbinsize=50000 statsfile=HiFi_S3_R2.stats scafstats=HiFi_S3_R2.scaf covstats=HiFi_S3_R2.cov bincov=HiFi_S3_R2.bincov outu=HiFi_S3_R2.unmap.bam outm=HiFi_S3_R2.bam in=HiFi_S3_R1_001_val_1.fq.gz in2=HiFi_S3_R2_001_val_2.fq.gz

This must be the issue, since BBMap genome indexing is a bit different from bwa or bowtie2. It renames the fasta file and organizes the fasta and indexes into it's own subfolders. As opposed to bwa/bowtie which just fill up the working director with index files, while keeping the original fasta intact.

UPDATE:

Same issue is observed when the same input read files are re-aligned with bowtie2.

Here is the @pg line from the bowtie generated bam file:

@PG ID:bowtie2 PN:bowtie2 VN:2.3.5.1 CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 -p 96 -X 2000 -x /work1/genome/hg38 -t -S HiFi_S3_bow.sam --met-file HiFi_S3_bow.metric -1 HiFi_S3_R1_001_val_1.fq.gz -2 HiFi_S3_R2_001_val_2.fq.gz"

The errors regarding faidx are exactly the same.

It is somehow not finding the hg38.fa file ? ?

trshyd commented 5 years ago

Sorry to keep replying to my own issue thread; not sure if that is a faux pas on GitHub or not?

Anyway, I've temporarily solved the issue for myself. The perl script is not grabbing my genome location from the @pg line of the bam header. So to just get things running with no errors, I editing the following lines:

$PG =samtools view -H $edited_bam | grep PG; if ($PG =~ /sampe\s+(.+?)\s+/) { $genome = $1; }

I did a symbolic link of my hg38 fasta file (and it's faidx) to the current working directory, and changed the above lines to this:

$genome = "./hg38.fa";

The analysis is quite slow, taking 24hrs+ just to get through chromosome1. But it is currently running with no error message; properly doing whatever 'samtools faidx' operation it needs to perform.

It's not an 'issue' so I won't create a new one, but it would be nice to have multithreading capabilities.

I suppose a workaround would be to chop the bam up into smaller pieces and process one-at-a-time?

staciawyman commented 5 years ago

Dear trshyd, First, apologies for the delayed response. I can't figure out why GitHub is not notifying me of posts here when I have it set to do so. Second, thank you for bringing this my attention and for debugging the problem! I use BWA for my alignment and had assumed that the genome was encoded the same way for other aligners. I will change it to accept the reference genome at the command line so this is not an error for people in the future.

Did you encounter any other problems? Best, Stacia

staciawyman commented 5 years ago

I have modified the code so that the reference genome is now given as a command line argument to BLENDER. Thanks for your help on this.

staciawyman / blender

Run error #2