Closed maximilianpress closed 5 years ago
@maximilianpress Thank you for using DFAST.
You are correct. All the necessary databases are automatically downloaded if you install DFAST via conda. Judging from the number of sequences loaded from the reference database (404804), the reference database is properly set up.
Another concern is that the numbers of contigs (27457) and predicted protein-coding genes (41533) seem too large for an assembled bacterial genome. Is it an assembly obtained from metagenomic data? I think DFAST can work with it, but an unexpected error may happen.
Still I don't have any idea why this happened, but if you can share the genome data with me (Please send it to ytanizaw@nig.ac.jp), I will look into it.
Yasuhiro
Hi Yasuhiro, I've run into the same problem and have been working to narrow it down, here's what I've got so far:
I've gotten it to occur on a conda install, not on a "vanilla" install with minimal DBs installed
It only occurs when self.execute_blast(targets=proteins_with_transl_except)
is triggered (e.g. more than one selenosysteine/pyrrolysine containing CDS).
The problem appears to be OUT/pseudogenedetection/reference.faa contains fasta sequences that are not in self.references or self.candidates
BUT this is because it's in reference/candidates as X.1
instead of as X
, which is what blast said it was, eg. ./blast_result.out == MGA_1068 WP_010895684 99.692 650 2 0 33 682 28 677 0.0 1358
but the actual record name is WP_010895684.1 in self.reference
I suspect this is something that blast fmtdb is doing to be helpful, but it should be an easish fix. Not sure why it only occurs in conda installs though, maybe has to do with the databases are are being pulled?
This obviously isn't ideal, but replacing protein = self.references[s_id]
in PseudoGeneDetection.execute_blast._parse_result with:
try:
protein = self.references[s_id]
except KeyError:
for key in self.references.keys():
if s_id in key:
protein = self.references[key]
temporarily patches out the problem
@mnapolitano89 thanks for digging more into this!
@nigyta unfortunately the data are proprietary and I cannot share. But I would be interested to know if @mnapolitano89 's information is enough for a fix as I hope to use DFAST again. The assembly is unfortunately an isolate assembled with a metagenomic assembler (I am very aware of the potential problems here!), so it is unsurprising if it is acting weird. But it looks like the issue is something else.
@mnapolitano89 @maximilianpress Thank you very much for the detailed report.
The error might be due to the version difference of BLAST executables. Although I still cannot reproduce the error in my environment, the following procedure may fix the problem.
First, download BLAST executables (ver2.6) from the DFAST repository.
$ wget https://github.com/nigyta/dfast_core/raw/master/bin/Linux/blastdbcmd
$ wget https://github.com/nigyta/dfast_core/raw/master/bin/Linux/blastp
$ wget https://github.com/nigyta/dfast_core/raw/master/bin/Linux/makeblastdb
Then, after adding execute permission as following
$ chmod a+x blastdbcmd blastp makeblastdb
move them to /home/USER/miniconda3/opt/dfast-X.X.X/bin/Linux/
$ mv blastdbcmd blastp makeblastdb /home/USER/miniconda3/opt/dfast-X.X.X/bin/Linux/
(The destination directory varies depending on where anaconda/miniconda is installed.) If you use Mac, please relace "Linux" in the above commands with "Darwin".
When running DFAST, it inserts /home/USER/miniconda3/opt/dfast-X.X.X/bin/Linux/ at the head of PATH
environmetal variable, so ver2.6 executables will be called preferentially.
I hope this helps.
Yasuhiro
Moving from v2.5 to v2.6 seems to solve it - is it possible to update the conda recipe to require/install v2.6?
Yes, but the build test fails for some reason, and I'm still working on it. Unfortunately, I'll be out for next week, so it may take time until I fix this issue.
For now, a good option is to update blast by
conda update -c bioconda blast
@mnapolitano89 @maximilianpress
Sorry for the late response. I have updated the bioconda recipe. Now it requires NCBI-Blast v2.6 or later, so I think this issue will be fixed by conda updata -c bioconda dfast
or conda install -c bioconda dfast
.
thanks!
I am running on Amazon Linux (CentOS) and encountered an error with a specific assembly. I checked other assemblies and they complete ok without this issue on the same instance.
It looks like something is missing from one of the reference databases but it's hard for me to figure out what the issue is. I have installed with conda as laid out in the readme. My understanding was that this includes all necessary databases. However, I also attempted to install the databases manually as outlined in the readme and the problem persisted.
Can you see if anything is wrong here in the log below? Thanks.