Closed Rand0m-Guy closed 4 years ago
Stanford.NLP.CoreNLP
or Stanford.NLP.POSTagger
? (What version)C:/Users/myUser/Desktop/UnityProyect/Assets/Plugins/spanish-ud.tagger
?1.- I'm using Stanford.NLP.CoreNLP (3.9.2) downloaded from NuGet, which means in the Unity Assets Folder I'm referencing these Stanford dll's:
And these IKVM dll's:
2.- Technically speaking... no. (Lemme explain lol) that was the reference used when I used the original Stanford CoreNLP in Java. What I have is the Spanish model provided by Stanford. I tried referencing it directly (maybe I did it in a wrong way), and I also have it converted to a dll via IKVM
3.- NuGet
4.- Sure:
using System.Collections;
using System.Collections.Generic;
using edu.stanford.nlp.ling;
using edu.stanford.nlp.pipeline;
using edu.stanford.nlp.tagger.maxent;
using java.io;
using java.util;
using UnityEngine;
using System;
using System.IO;
public class Test : MonoBehaviour
{
StanfordCoreNLP scnlp;
Properties props;
string propname = "tokenize, ssplit, pos, lemma, ner";
void Start()
{
props = new Properties();
props.setProperty("annotators", propname);
scnlp = new StanfordCoreNLP(props);
string text = "Hola, esto es un texto en español."; //Spanish text
POS(text);
}
void Update()
{
}
public void POS(string text) {
var tagger = new MaxentTagger(@"C:/Users/myUser/Desktop/UnityProyect/Assets/Plugins/spanish-ud.tagger");
var sentences = MaxentTagger.tokenizeText(new java.io.StringReader(text)).toArray();
foreach (java.util.ArrayList sentence in sentences)
{
var taggedSentence = tagger.tagSentence(sentence);
Debug.Log(SentenceUtils.listToString(taggedSentence, false));
}
}
}
Option 1
If you want to use var tagger = new MaxentTagger(@"C:/Users/myUser/Desktop/UnityProyect/Assets/Plugins/spanish-ud.tagger");
then you need file C:/Users/myUser/Desktop/UnityProyect/Assets/Plugins/spanish-ud.tagger
physically available at this path.
Just unzip Java *.jar
with Spanish model and provide full path to spanish-ud.tagger
file. Also in this flow you do not need lines
props = new Properties();
props.setProperty("annotators", propname);
scnlp = new StanfordCoreNLP(props);
because you do not use scnlp
.
Option 2
You can go with StanfordCoreNLP
and build annotators pipeline.
then process your text
and receive Annonation
with all metadata extracted by annotators from your pipeline
here is an example of how you can manually extract data from different kinds of annotations
You are honestly the best. Thank you so so much!
While checking the documentation, I noticed that the .NET port of the POS tagger only supports a couple of languages. I need it for a non-supported language (spanish, specifically). What I have been trying for some weeks now is to use a dll version of the jar file of the language supplied by Stanford CoreNLP, however it does not seem to work. When using
var tagger = new MaxentTagger(@"C:/Users/myUser/Desktop/UnityProyect/Assets/Plugins/spanish-ud.tagger");
It throwsAs a way of trying to experiment, also tried routing it to the original jar file, only for it to throw the same IOException. Also tried using "props.setProperty(...)" but, same error was thrown.
How should I use a Stanford CoreNLP language model with the .NET version?