tseemann / mlst

:id: Scan contig files against PubMLST typing schemes
GNU General Public License v2.0
201 stars 47 forks source link

mlst Docker container #33

Closed miguelpmachado closed 7 years ago

miguelpmachado commented 7 years ago

Hi,

I'm trying to build a Docker container with mlst, but it seems not working properly. I'm using Blast+ v2.6.0 binaries and mlst GitHub release v2.9 (wget https://github.com/tseemann/mlst/archive/2.9.tar.gz). The thing is, if I use a .fna file it worked (although sh: 1: file: not found appears):

mlst GCF_000007265.1_ASM726v1_genomic.fna

[12:54:23] Found 'blastn' => /NGStools/ncbi-blast-2.6.0+/bin/blastn
[12:54:23] Excluding 2 schemes: ecoli_2 abaumannii
sh: 1: file: not found
[12:54:23] Scanning: GCF_000007265.1_ASM726v1_genomic.fna [GCF_000007265.1_ASM726v1_genomic.fna]
[12:54:24] Found exact allele match sagalactiae.sdhA-2
[12:54:24] Found exact allele match sagalactiae.pheS-1
[12:54:24] Found exact allele match sagalactiae.atr-3
[12:54:24] Found exact allele match sagalactiae.glnA-2
[12:54:24] Found exact allele match sagalactiae.adhP-1
[12:54:24] Found exact allele match sagalactiae.tkt-9
[12:54:24] Found exact allele match sagalactiae.glcK-2
GCF_000007265.1_ASM726v1_genomic.fna    sagalactiae     110     adhP(1) pheS(1) atr(3)  glnA(2) sdhA(2) glcK(2) tkt(9)

If I compress the .fna file, it doesn't work:

mlst GCF_000007265.1_ASM726v1_genomic.fna.gz

[13:01:19] Found 'blastn' => /NGStools/ncbi-blast-2.6.0+/bin/blastn
[13:01:19] Excluding 2 schemes: abaumannii ecoli_2
sh: 1: file: not found
[13:01:19] Scanning: GCF_000007265.1_ASM726v1_genomic.fna.gz [GCF_000007265.1_ASM726v1_genomic.fna.gz]
Error: NCBI C++ Exception:
    T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_350334_130.14.22.10_9008__PrepareRelease_Linux64-Centos_1481139955/c++/compilers/unix/../../src/objtools/readers/fasta.cpp", line 2428: Error: CFastaReader: Near line 1, there's a line that doesn't look like plausible data, but it's not marked as defline or comment. (m_Pos = 1)

GCF_000007265.1_ASM726v1_genomic.fna.gz -       -

I also tried with a .gbk and it worked (with sh: 1: file: not found):

mlst GCF_000007265.1_ASM726v1_genomic.gbff

[13:06:47] Found 'blastn' => /NGStools/ncbi-blast-2.6.0+/bin/blastn
[13:06:47] Excluding 2 schemes: abaumannii ecoli_2
sh: 1: file: not found
[13:06:47] Converting to FASTA: GCF_000007265.1_ASM726v1_genomic.gbff
[13:06:48] Scanning: GCF_000007265.1_ASM726v1_genomic.gbff [/tmp/wwyiujD0sR]
[13:06:48] Found exact allele match sagalactiae.sdhA-2
[13:06:48] Found exact allele match sagalactiae.pheS-1
[13:06:48] Found exact allele match sagalactiae.atr-3
[13:06:48] Found exact allele match sagalactiae.glnA-2
[13:06:48] Found exact allele match sagalactiae.adhP-1
[13:06:48] Found exact allele match sagalactiae.tkt-9
[13:06:48] Found exact allele match sagalactiae.glcK-2
GCF_000007265.1_ASM726v1_genomic.gbff   sagalactiae     110     adhP(1) pheS(1) atr(3)  glnA(2) sdhA(2) glcK(2) tkt(9)
[13:06:48] Deleting temporary files: /tmp/wwyiujD0sR

But if I compress it, it stops again:

mlst GCF_000007265.1_ASM726v1_genomic.gbff.gz

[13:07:07] Found 'blastn' => /NGStools/ncbi-blast-2.6.0+/bin/blastn
[13:07:07] Excluding 2 schemes: abaumannii ecoli_2
sh: 1: file: not found
[13:07:07] Scanning: GCF_000007265.1_ASM726v1_genomic.gbff.gz [GCF_000007265.1_ASM726v1_genomic.gbff.gz]
Error: NCBI C++ Exception:
    T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_350334_130.14.22.10_9008__PrepareRelease_Linux64-Centos_1481139955/c++/compilers/unix/../../src/objtools/readers/fasta.cpp", line 2428: Error: CFastaReader: Near line 1, there's a line that doesn't look like plausible data, but it's not marked as defline or comment. (m_Pos = 1)

GCF_000007265.1_ASM726v1_genomic.gbff.gz        -       -

I assumed that somehow, mlst script is not uncompressing the .fna.gz file. Can you help me putting it working for compressed files and getting rid off sh: 1: file: not found?

Thank you for your help.

Miguel

Slugger70 commented 7 years ago

Hi Miguel,

There is a docker container with mlst built into it currently in Biocontainers. It is version 2.9.

If you want it run docker pull quay.io/biocontainers/mlst:2.9--pl5.22.0_0

This Biocontainer was produced automatically via the BioConda recipe process and works fine.

Hope this helps.

miguelpmachado commented 7 years ago

Hi @Slugger70, Thanks for the tip. I also think that mlst is part of sangerpathogens/mlst_check container. But since I'm trying to include mlst in a pipeline for our wet lab users (black box type), it would be very nice if I manage to solve this problem. Thanks once more. Cheers, Miguel

tseemann commented 7 years ago

@miguelpmachado i thought rematch could do MLST ? :)

To check if the file is gzipped it runs the file --brief --dereference command To decompress it needs the gzip -f -d -c command

I have added checks for these two tools in 2.10-dev which is git master now.

% file --version
file-5.11
magic file from /etc/magic:/usr/share/misc/magic

Do you need to apt-get install file? Or maybe your version doesn't supprt the --brief --dereference options?

I might switch to perl compression libraries.

miguelpmachado commented 7 years ago

Hi @tseemann, Well noted. But it uses the reads. Since mlst will be incorporated in our assembly pipeline, we can benefit from having the contigs. And the auto-detection mode is very useful as a last checking point. The sh: 1: file: not found was a error for the missing file program! I thought it was complaining about some missing file! My fault, I didn't know about the file program. After I installed it, mlst worked perfectly :) Thank you a lot, and sorry for my ignorance. Cheers, Miguel

tseemann commented 7 years ago

@miguelpmachado I am very surprised that your Docker base image does not have the file command! But I guess they are heavily cut down images to save space.

Thank you for reporting this issue. I had forgotten I used the file command. I can now check it exists.

Calling MLST from FASTQ might be more reliable then from assembled contigs, and probably much faster.