ufal / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
Mozilla Public License 2.0
358 stars 75 forks source link

C++ Finding lemma of a word using UDPipe #62

Closed sb-b closed 6 years ago

sb-b commented 6 years ago

Hi,

I want to find lemma of a word inside my c++ code. Is it possible to do this using UDPipe?

Thanks.

sb-b commented 6 years ago

The code snippet that uses udpipe looks like this:

  `unique_ptr<model> model(model::load("home/betul/udpipe/udpipe-models-2017/models/turkish-
   ud-2.0-conll17-170315.udpipe"));

  pipeline pline(model.get(), "generic_tokenizer", pipeline::DEFAULT, pipeline::DEFAULT, "conllu");

  std::string error;

  istringstream str("Ben eve gittim.");  // A Turkish sentence meaning "I went home".

  std::istream input(str.rdbuf());  // Trying to convert istringstream to istream object

  if(!pline.process(input,cout,error))
      cerr << "error:" << error << endl;`

When I run my code, I get "segmentation fault" error.

foxik commented 6 years ago

The code actually works for me. BTW, creating the std::stream input is redundant, you can easily pass the istringstream as the first parameter of process.

Note that UDPipe can return only disambiguated lemma -- if you have only a word, you would probably like to get all the lemmas (since it is not obvious which is a correct one), but you can consider the word a sentence and run the tagger. To do that, you can run for example:

  sentence s;
  s.add_word("Ben");

  string error;
  if (!model->tag(s, model::DEFAULT, error))
    cerr << "error:" << error << endl;
  else
    cout << s.words[1].lemma;
sb-b commented 6 years ago

Your example also compiles successfully but gives "segmentation fault" error in run time. I couldn't figure out the reason behind this error. Do you have any idea why I see this segmentation fault error?

Thank you for your help.

Betül

On Fri, Mar 9, 2018 at 11:58 AM, Milan Straka notifications@github.com wrote:

The code actually works for me. BTW, creating the std::stream input is redundant, you can easily pass the istringstream as the first parameter of process.

Note that UDPipe can return only disambiguated lemma -- if you have only a word, you would probably like to get all the lemmas (since it is not obvious which is a correct one), but you can consider the word a sentence and run the tagger. To do that, you can run for example:

sentence s; s.add_word("Ben");

string error; if (!model->tag(s, model::DEFAULT, error)) cerr << "error:" << error << endl; else cout << s.words[1].lemma;

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ufal/udpipe/issues/62#issuecomment-371752916, or mute the thread https://github.com/notifications/unsubscribe-auth/AEEx3jCP5Zka3RSuWBafiN3LLgjqSUvnks5tckQqgaJpZM4SZhri .

foxik commented 6 years ago

You need to provide more details -- what and how you compiled, what system/compiler you have. Also when you compile using a debug build and run in a debugger, do you get a stackgrace?

sb-b commented 6 years ago

I have solved my problem by calling unix command of udpipe web service from my c++ program. It works well now.

Thank you.

On Sat, Mar 17, 2018 at 1:17 AM, Milan Straka notifications@github.com wrote:

You need to provide more details -- what and how you compiled, what system/compiler you have. Also when you compile using a debug build and run in a debugger, do you get a stackgrace?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ufal/udpipe/issues/62#issuecomment-373860011, or mute the thread https://github.com/notifications/unsubscribe-auth/AEEx3pnPAwpLh33Q_-jrYUwsshpPLxFkks5tfDn7gaJpZM4SZhri .