hammerlab / prohlatype

Probabilistic HLA typing
Apache License 2.0
35 stars 4 forks source link

align2fasta error - "Couldn't extract sequence align date." #147

Closed npavlovikj closed 6 years ago

npavlovikj commented 6 years ago

Hi,

I tried using "prohlatype" 0.9.0 with the pre-compiled binaries and the Docker image. Although "align2fasta --help" works fine, when I run the test example using IMGT/HLA, I get the following error in both cases: _opam@6362de72010a:~$ align2fasta imgthla/alignments/ -o imputed_hla_class_I align2fasta: internal error, uncaught exception: Prohlatype_MSA.Parser.Error("Couldn't extract sequence align date.") Raised at file "pervasives.ml", line 436, characters 14-31 Called from file "src/lib/mSA.ml", line 359, characters 26-41 Called from file "src/lib/mSA.ml", line 367, characters 6-21

Am I missing something in the setup?

I would appreciate any suggestions and help regarding this matter.

Thank you, Natasha

rleonid commented 6 years ago

Maybe there's something wrong with the docker image.

Does imgthla/alignments contain these files?

npavlovikj commented 6 years ago

Yes, the IMGT/HLA clone contains all the files from the repo, and I can open and modify them. I got the error above with both the Docker image and the Linux pre-compiled binaries.

rleonid commented 6 years ago

Now, I know why; IMGT updated their alignment formats to start those comment lines with a #. I assume that you don't want to submit a patch :smile: ? Don't worry, I'll submit a patch in a couple of days. If you need something to work right away, I'd recommend using an older version of IMGT.

npavlovikj commented 6 years ago

Thanks @rleonid ! I just downloaded IMGT/HPA 3.1.0 which doesn't have the first few lines commented out with "#". However, I am still getting the same error as above, again for both the binaries and the Dockerfile :/

rleonid commented 6 years ago

I assume "HPA" was a typo?

3.1.0 is too far back. The last version that I tested prohlatype against was 3240. You can just checkout that branch.

npavlovikj commented 6 years ago

Hah, yes, it was :) @rleonid , I am so sorry, but I keep getting the same error with 3240 too... Do you maybe have another test file I can try with "align2fasta"?

rleonid commented 6 years ago

I'm sorry. Yes, 3.240 doesn't work (the release notice is written differently). 3310 seems to be the last branch that works. Again, apologies for all of the difficulties.

npavlovikj commented 6 years ago

Thanks @rleonid - that branch worked perfectly with both the binaries and the Docker image. And no apologies needed - I am glad we figured this out in a timely manner :)

rleonid commented 6 years ago

Glad that it works. I hope that you find prohlatype useful.

I'll keep the issue open until I fix it later this week.

rleonid commented 6 years ago

Fixed in master.

kloot commented 5 years ago

Hi @rleonid, I'm getting the same error, both with the binaries from zip archives for 0.8 and 0.9, and the docker container built with run-example-docker.sh (also both versions), The files that raise the exceptions are slightly different in both versions. Is there another imgt format change? Below is the top of imgthla/alignments/A_gen.txt. Thanks in advance for your help!

# file: A_gen.txt
# date: 2019-01-23
# version: IPD-IMGT/HLA 3.35.0
# origin: http://hla.alleles.org/wmda/A_gen.txt
# repository: https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/alignments/A_gen.txt
# author: WHO, Steven G. E. Marsh (steven.marsh.ac.uk)

gDNA              -300
                  |
A*01:01:01:01     CAGGAGCAGA GGGGTCAGGG CGAAGTCCCA GGGCCCCAGG CGTGGCTCTC AGGGTCTCAG GCCCCGAAGG CGGTGTATGG ATTGGGGAGT CCCAGCCTTG
rleonid commented 5 years ago

Yea, looks like it.

Here is where I look for the release date. Apologies, but I don't have the resources to keep altering the prohlatype parser to maintain compatibility. Feel free to fork, and/or send a PR... or unfortunately, use an older version of IMGT. Or you could write a script to modify their headers to the old style; probably the easiest if you're not familiar with OCaml.

kloot commented 5 years ago

Hi @rleonid, thanks for your quick reply. Would it be possible for you to attach an IMGT file header that still works with v 0.9.0? I'd like to try writing a modifier script that restores the old style. Many thanks in advance!

rleonid commented 5 years ago

@kloot Can you post a stack trace, I'm unable to reproduce against master and the latest IMGT.

kloot commented 5 years ago

Hi @rleonid, thanks for your reply! Sorry for the poor formatting, can't get code format for large blocks to work...

for v 0.9.0 ./align2fasta.exe imgthla/alignments -o results/imputed_hla_class_I align2fasta: internal error, uncaught exception: Prohlatype__MSA.Parser.Error("Couldn't extract sequence align date.") Raised at file "pervasives.ml", line 436, characters 14-31 Called from file "src/lib/mSA.ml", line 359, characters 26-41 Called from file "src/lib/mSA.ml", line 367, characters 6-21

running your script for v0.8.0 (added apt-get update as per issue #149) bash run-example-docker.sh 0.8.0: Pulling from leonidr/prohlatype Digest: sha256:ef2476156d3e3141683166e787aef35274ebddb858706347affead03e309197a Status: Image is up to date for leonidr/prohlatype:0.8.0 Sending build context to Docker daemon 25.6 kB Step 1/6 : FROM leonidr/prohlatype:0.8.0 ---> 1b81ac969c3d Step 2/6 : RUN sudo apt-get update ---> Using cache ---> 2a53cab29cc7 Step 3/6 : RUN sudo apt-get install -y bwa wget libarchive-dev libbz2-dev liblzma-dev ---> Using cache ---> 1a1ce8a5874e Step 4/6 : RUN wget https://github.com/samtools/samtools/releases/download/1.7/samtools-1.7.tar.bz2 ---> Using cache ---> d0416916be14 Step 5/6 : RUN tar xvfj samtools-1.7.tar.bz2 ---> Using cache ---> 5d980f664700 Step 6/6 : RUN bash -c "cd samtools-1.7 && ./configure && make && sudo make install" ---> Using cache ---> 113487abd18b Successfully built 113487abd18b IMGT HLA already there --2019-01-27 23:04:01-- https://raw.githubusercontent.com/hammerlab/prohlatype/master/tools/test_reads.fastq Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.164.133 Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.164.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 22406 (22K) [text/plain] Saving to: ‘tmp/results/sample.fastq’

tmp/results/sample.fastq 100%[===========================================================================================================>] 21.88K --.-KB/s in 0.004s

2019-01-27 23:04:01 (4.86 MB/s) - ‘tmp/results/sample.fastq’ saved [22406/22406]

/_/o< align2fasta --version 0.8.0 /_/o< multipar --version 0.8.0 /_/o< bwa 2>&1 | grep Version Version: 0.7.12-r1039 /_/o< samtools 2>&1 | grep Version Version: 1.7 (using htslib 1.7) /_/o< find /imgthla -type f | wc -l 487 /__/o< align2fasta /imgthla/alignments -o /results/imputed_hla_class_I align2fasta: internal error, uncaught exception: Prohlatype__MSA.Parser.Error("Couldn't extract sequence align date.") Raised at file "src/lib/mSA.ml", line 127, characters 23-38 Called from file "src/lib/mSA.ml", line 536, characters 14-52 Re-raised at file "src/lib/mSA.ml", line 541, characters 6-13 Called from file "src/lib/alleles.ml", line 703, characters 17-42 Called from file "src/app/align2fasta.ml", line 105, characters 12-37 Called from file "src/lib/util.ml", line 174, characters 17-24 Called from file "src/app/align2fasta.ml", line 104, characters 8-123 Called from file "src/cmdliner_term.ml", line 27, characters 19-24 Called from file "src/cmdliner.ml", line 106, characters 32-39

rleonid commented 5 years ago

@kloot Can you check against master?

kloot commented 5 years ago

@rleonid, I assume you wanted me to build from source? Following the build instructions on Readme.md, it fails with what looks like problems related to opam. I'm not familiar with opam at all but tried around a little with no overall luck. Any chance you could update the docker container ?

opam switch 4.05.0

Run eval $(opam env) to update the current shell environment

eval $(opam env) make setup opam install --deps-only ./prohlatype.opam [WARNING] Failed checks on prohlatype package definition from source at git+file:///home/kbl/APPS/prohlatype#master: error 57: Synopsis and description must not be both empty The following dependencies couldn't be met:

No solution found, exiting Makefile:9: recipe for target 'setup' failed make: *** [setup] Error 20 opam switch 4.06.0

Run eval $(opam env) to update the current shell environment

eval $(opam env) make setup opam install --deps-only ./prohlatype.opam [WARNING] Failed checks on prohlatype package definition from source at git+file:///home/kbl/APPS/prohlatype#master: error 57: Synopsis and description must not be both empty The following dependencies couldn't be met:

No solution found, exiting Makefile:9: recipe for target 'setup' failed make: *** [setup] Error 20