voikko / corevoikko

Libvoikko and essential linguistic resources
Other
89 stars 25 forks source link

Get all baseforms words from WORDBASES #47

Closed artemyarulin closed 3 years ago

artemyarulin commented 3 years ago

Hi, thanks for the great library, I wonder if there is a way to get all baseforms words from WORDBASES using C++ API?

Given lastenkauppa I can get WORDBASES string like +lasten(lapsi)+kauppa(kauppa), but I want to get [lapsi, kauppa]. I can quickly parse this line manually by finding everything between (), but wonder if there is a better way?

hatapitk commented 3 years ago

Hi! Parsing WORDBASES is indeed the way to go. It would be nice to have direct API for this but since the C++ API is the core API with strict backwards compatibility requirements I try to keep it as minimal as possible. The API for your use case would easily become complicated with all the options that would be needed to configure behaviour related to word derivation. For example WORDBASES=+itsenäisy(itsenäinen)+ys(+ys)+päivä(päivä): would you like to have [itsenäinen, päivä] or [itsenäisyys, päivä]? That likely depends on the use case so we would need to support both. Thus I consider it better to have this one API and let users parse it the way that bests suits their needs.

Of course, a wrapper library to offer such functionality would be fine.

artemyarulin commented 3 years ago

Makes total sense, thanks again for such a great library!