statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

`cnlp_init_udpipe(model_name = "russian-syntagrus")` doesn't work #85

Open Pozdniakov opened 3 months ago

Pozdniakov commented 3 months ago

Initiation of syntagrus using udpipe doesn't work:

> cnlp_init_udpipe(model_name = "russian-syntagrus")
> cnlp_annotate("К нам едет ревизор")
Error: external pointer is not valid

While alternatives are ok:

> cnlp_annotate("К нам едет ревизор")
$token
# A tibble: 4 × 11
  doc_id   sid tid   token   token_with_ws lemma   upos  xpos  feats                tid_source relation
*  <int> <int> <chr> <chr>   <chr>         <chr>   <chr> <chr> <chr>                <chr>      <chr>   
1      1     1 1     К       "К "          к       ADP   IN    NA                   2          case    
2      1     1 2     нам     "нам "        мы      PRON  PRP   Case=Dat|Number=Plu… 3          obl     
3      1     1 3     едет    "едет "       ести    VERB  VBC   Aspect=Imp|Mood=Ind… 0          root    
4      1     1 4     ревизор "ревизор"     ревизор NOUN  NN    Animacy=Inan|Case=N… 3          nsubj   

$document
  doc_id
1      1

attr(,"class")
[1] "cnlp_annotation" "list"
> cnlp_init_udpipe(model_name="russian-taiga")
> cnlp_annotate("К нам едет ревизор")
$token
# A tibble: 4 × 11
  doc_id   sid tid   token   token_with_ws lemma   upos  xpos  feats                tid_source relation
*  <int> <int> <chr> <chr>   <chr>         <chr>   <chr> <chr> <chr>                <chr>      <chr>   
1      1     1 1     К       "К "          к       ADP   ADP   NA                   2          case    
2      1     1 2     нам     "нам "        мы      PRON  NA    Case=Dat|Number=Plu… 3          obl     
3      1     1 3     едет    "едет "       ехать   VERB  VERB  Aspect=Imp|Mood=Ind… 0          root    
4      1     1 4     ревизор "ревизор"     ревизор NOUN  NOUN  Animacy=Inan|Case=N… 3          nsubj   

$document
  doc_id
1      1

attr(,"class")
[1] "cnlp_annotation" "list" 

Using udpipe there are no such problems and syntagrus works well.

statsmaths commented 1 month ago

Thanks for opening the issue and apologies for the delay in getting back to you. I just tried the same code, and both models run for me on my machine. Given that you had it working okay on udpipe, and the error about external pointers suggests some error on the udpipe side because cleanNLP doesn't generate it's own external pointers when working with the udpipe backend, my best guess is that something got corrupted when cleanNLP downloaded the "russian-syntagrus" model. If you re-install cleanNLP, that will clear the cache of udpipe models, and my best guess is that this will fix the issue. If not, please let me know some more details about your machine and the version of R/udpipe that you're running.