Closed msnyder424 closed 2 years ago
Additional info: I originally noticed this issue when running in a Docker I pulled from quay:
docker pull quay.io/biocontainers/falco:0.3.0--h5aa19ff_1
I have the same error when run with falco installed with conda:
conda install -c bioconda falco
conda update falco # due to out of date version in conda. update installs 0.3.0
For the life of me I cannot get falco to read bam files when I build from the source code downloaded from the releases or from a git clone of the repo. I get this error every time:
Cannot recognize file format for file /test_bam/DS-376302.hg19.bam
htslib is installed and I ran the below after installing:
make HAVE_HTSLIB=1 all
make HAVE_HTSLIB=1 install
It happens with multiple different bam files I use as input.
Hello,
That's weird indeed. The "cannot recognize file format" should only occur upon compilation without HTSLib. The conda recipe has HTS as dependency and the compile instructions should be done with HTSLib. If there is a problem with a path to HTSLib the compilation should fail, and if it doesn't I need to look into why that's happening.
Would you be able to provide a small BAM file in which I could try to reproduce the character encoding bug? At least from my source compile it seems to be working with BAM files but I can imagine there may be some problem with tabs being added to the QUAL string. Thank you!
Oh one additional thing that my explain the problem in your first comment. The command
falco ${BAM_FILE} -o falco_fastqc
should be
falco -o falco_fastqc ${BAM_FILE}
The last arg should be the output directory, otherwise it may interpret "-o" and "falco_fastq" as other input files to process. Not sure if this changes the outcome. In fact I'm surprised the command works as expected
Thanks for the help!
I agree that the command you suggested actually follows the usage in the falco help menu. This was a hold over from an old WDL task that ran fastqc. But alas, changing the command did not do the trick.
I should not have muddied the waters talking about the "Cannot recognize file format" error. That only happens when I try to use falco installed from a repo clone or source code zip. I believe the Docker image uses an instance installed with conda. When I execute with that Docker or an instance installed with conda, the program recognizes the file format. We can keep the discussion to only the issue in the first comment.
Happy to send a small BAM for you to investigate. Can't seem to attach a zip to this comment. Where should I send it?
You can send it to desenabr[at]usc[dot]edu and I can look into it.
Thanks for sharing the file! I think I see the problem, and indeed it reflects a bigger issue with falco. The \t was because falco was reading the optional SAM/BAM tags after the quality line as part of the quality scores.
I did a bunch of rewriting on the SAM/BAM processing functions to address this issues. Would you be able to pull from the falco repo and re-test if it runs to completion on your BAM files? Thank you so much!!
Thanks for the fix!
Here is what I ran to install from a clone:
git clone https://github.com/smithlabcode/falco.git
cd falco
sudo make all
sudo make install
sudo make HAVE_HTSLIB=1 all
sudo make HAVE_HTSLIB=1 install
When I clone the repo there is no configure. So I cannot run this part of the instructions:
./configure CXXFLAGS="-O3 -Wall" --enable-hts
I have htslib installed but I get this error every time I run falco with a bam file:
"Cannot recognize file format for file /home/dnanexus/DS-376333.hg19.bam"
I tried to create configure by running aclocal
, autoconf
, and automake --add-missing
, but the last command throws this error:
"configure.ac:21: error: required file 'config.h.in' not found"
Not sure if I'm doing the right things here...
OK. After fighting with my htslib installation, I finally got it to work. Looks like the update worked! THANK YOU!
In the end I got it to work with:
git clone https://github.com/smithlabcode/falco.git
cd falco
sudo make HAVE_HTSLIB=1 all
sudo make HAVE_HTSLIB=1 install
But then I got this error from falco: "bin/falco: error while loading shared libraries: libhts.so.3: cannot open shared object file: No such file or directory" This missing dependency was in /usr/local/lib/libhts.so.3.
As a hack, I just linked all the libs in that dir to /usr/lib/.
I don't think that was the best way around that, but I'm not sure what else to do. Is it possible falco can and should be configured to look for that dependency in multiple locations?
I installed htslib like so:
HTS_LIB_VERSION=1.15.1
wget https://github.com/samtools/htslib/releases/download/${HTS_LIB_VERSION}/htslib-${HTS_LIB_VERSION}.tar.bz2
tar xf htslib-${HTS_LIB_VERSION}.tar.bz2
cd htslib-${HTS_LIB_VERSION}
./configure
sudo make
sudo make install
And I figured out the fix to the htslib install.
./configure --prefix=/usr/
Case closed. Good to release I think!
Thanks for all your work! Any idea when we can expect a release?
Glad to know it's working!
Just for the record, if you clone from repo then these two commands should suffice
make HAVE_HTSLIB=1 all
make HAVE_HTSLIB=1 install
you don't need autotools to compile. I find it strange that the program compiles successfully but doesn't find HTSLib. If the -DUSE_HTS
flag is there on compilation, it should either fail to compile if htslib is not on your $LIBRARY_PATH
or it should compile successfully and identify BAM files. The behavior you describe is definitely puzzling but we can discuss this in another issue.
Is it ok to close this?
I’ll close it! Any timeline on a new release?
I get the error, "No known encoding with chars < 33. Yours was 9)" when I try to process a bam file with falco. Here is the call and stdout:
This is similar to issue #24 but not the same.
The 9 must be referring to the ASCII quality scores. 9 is a TAB (\t).
samtools view DS-376333.hg19.bam | grep -P "\t" | wc -l
shows me that every line in ${BAM_FLE} has a \t in it, which makes sense because BAMs are "tab delimited". So i'm not sure how to even find the offending \t that I imaging must be at the beginning, end, or middle of the quality scores.However, all my BAMs were created with the GATK best practices pipeline, so I don't see how they could be poorly formatted. Additionally, fastqc is able to process them without error, albeit very slowly.
Thanks for any help!