nlpub / maltparser-docker

A convenient image with a properly built MaltParser for Russian.
https://hub.docker.com/r/nlpub/maltparser/
BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

Troubles with using this container #1

Open mnvx opened 8 years ago

mnvx commented 8 years ago

I try to run next command and have next output:

root@edcd95fa6f1b:/malt# echo Данное программное обеспечение работает корректно | ./russian-malt.sh
    reading parameters ...
    tagging ...
     finished.
Guessing unknown lemmas in /tmp/kSP5Am64RF

Formats:
-c  $w\t$B\t$t\n    Output format
-B  $w  Computed base form output format.
-X- Not XML input.
-f/malt/treetagger/cmd/wform2011.ptn1    File with flex patterns.
-d  Dictionary: File not specified.
-v  Tag friends file: File not specified.
-x  Lexical type translation table: File not specified.
-z  Full form - Lemma type conversion table: File not specified.

Switches:
-t  Input has tags.
-q- Do not sort output.(default)
Input is not sorted before processing (no option -q and no $f field in -c<format> or -W<format> argument)
-s| Ambiguous output is '|'-separated (default)
-U- allow ambiguous flex rules
-u- allow ambiguous dictionary look-up
-H2 don't use lemma frequencies for disambigation
-l- lemmas are same case as full form
-m0 Reading unlimited number of words from input (default).
-eU Use Unicode Character encoding for case conversion.
-o  Output text: Using standard output.
-p  keep punctuation (default)

-i  Input text: Using standard input.

all words      0
unknown words  0 (100%%)
conflicting    0 (100%%)

-----------------------------------------------------------------------------
                          MaltParser 1.5                             
-----------------------------------------------------------------------------
         MALT (Models and Algorithms for Language Technology) Group          
             Vaxjo University and Uppsala University                         
                             Sweden                                          
-----------------------------------------------------------------------------

Started: Thu Jul 14 17:49:39 UTC 2016
  Transition system    : Arc-Eager
  Parser configuration : Nivre with RELAXED root handling
  Feature model        : rus-liblinear.xml
  Classifier           : liblinear
  Data Format          : /rus-test/conllx.xml
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000525cec000, 2945605632, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 2945605632 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /malt/hs_err_pid60.log
<s>
1   Данное    данный    P   P   P--nsna
1   программное  программный  A   A   Afpnsnf
1   обеспечение  обеспечение  N   N   Ncnsnn
1   работает    работать    V   V   Vmip3s-a-e
1   корректно  корректно  R   R   R

</s>
</text>

My questions are:

  1. Guessing unknown lemmas in /tmp/kSP5Am64RF - is it normal?
  2. Is it good?

    all words      0
    unknown words  0 (100%%)
    conflicting    0 (100%%)
  3. Every line have number one. Is it properly?

    1   Данное    данный    P   P   P--nsna
    1   программное  программный  A   A   Afpnsnf
    1   обеспечение  обеспечение  N   N   Ncnsnn
    1   работает    работать    V   V   Vmip3s-a-e
    1   корректно  корректно  R   R   R
dustalov commented 8 years ago

Hello, this image integrates the available scripts to make the parser running with minimal changes to them. Thus, due to the lack of time, additional patches are not included and the scripts are currently provided as is.

Regarding your questions: (1) yes, it is OK to use temporary files for intermediate operations, (2) no, it seems to be wrong due to the potential issues in the CSTLemma invocation, and (3) no, the usable result is written to the tmpmalttext.parse file instead.

The corresponding article on NLPub in Russian demonstrates the “proper” way to use this software that works for me (I checked the approach just now).

$ docker pull nlpub/maltparser
$ docker run --rm -it nlpub/maltparser /bin/bash
root@2ac6d282032f:/malt# echo Данное программное обеспечение всё-таки работает | ./russian-malt.sh
root@2ac6d282032f:/malt# cat tmpmalttext.parse
1   Данное    данный    P   P   P--nsna 3   опред  _   _
2   программное  программный  A   A   Afpnsnf 3   опред  _   _
3   обеспечение  обеспечение  N   N   Ncnsnn  5   предик    _   _
4   всё-таки всё-таки N   N   Ncmsgn  3   1-компл    _   _
5   работает    работать    V   V   Vmip3s-a-e  0   ROOT    _   _

I will be very grateful for the improvements proposed. Probably, @versusvoid might be interested in this discussion.

mnvx commented 8 years ago

Thanks for reply.

I tryed that way and fixed install script.

Currently I have same problems with CSTLemma

    reading parameters ...
Value of <HANDLE> construct can be "0"; test with defined() at ./treetagger/cmd/lemmatiser.pl line 168.
    tagging ...
     finished.

...

all words      0
unknown words  0 (100%%)
conflicting    0 (100%%)

Result in tmpmalttext.parse seems correct:

1   Данное    данный    P   P   P--nsna 3   опред  _   _
2   программное  программный  A   A   Afpnsnf 3   опред  _   _
3   обеспечение  обеспечение  N   N   Ncnsnn  5   предик    _   _
4   всё-таки всё-таки R   R   R   5   обст    _   _
5   работает    работать    V   V   Vmip3s-a-e  0   ROOT    _   _
mnvx commented 8 years ago

Oh... You mean way with docker. With docker I have output as in first post . File tmpmalttext.parse in container is empty.

root@65ee6596db87:/malt# cat tmpmalttext.parse
root@65ee6596db87:/malt# 

Ubuntu 16.04 x64

dustalov commented 8 years ago

Well, this is something to investigate. Have you built the image on your own machine or pulled the nlpub/docker image from the Docker Registry? I used the latter option. If you have chosen the former, some external links might be broken resulting in malfunctioning configuration.