Closed kirillkh closed 7 years ago
@kirillkh Unfortunately, I can't publish the licensing terms because they are unknown to me too. I was given the all-clear to put the BGU lexicon on github by @rtsarfaty, you would have to contact her about the licensing terms, as this is a potentially thorny issue.
I will provide documentation for the format.
I suppose these files are very hard to create if you had to bundle them under such terms as opposed to generating new ones from scratch?
@kirillkh of course they are hard to generate :-) someone has curated over 500,000 words along their morphological properties each, or generated them along their correct morphological properties each, while also accounting for a lot of homomorphisms. Especially if the lexicon was quality reviewed, it is not trivial, to create an equivalent from scratch.
@habeanf I wonder what have been the minor changes you have made to the original BGU lexicon in order to better accommodate for the tasks at hand... this might give some clues as to interesting properties of the original lexicon (which I believe is only available by request from the original authors). Might this be easy to comment about?
@matanster @kirillkh Earlier this week there was a meeting with Alon Itai, head of MILA and one of the curators of the original lexicon from which the BGU lexicon was generated. The issue of licensing for this file is under discussion. The problem is that funding for this resource, as well as other MILA resources, came from the Israeli Government's Ministry of Science (משרד המדע). At the time (circa 2003), the government required that any resources resulting from projects it funded would cost money for commercial entities but would be freely available for academic research. These days there are discussions to "open up" the licensing such that they will be commercial-friendly, probably CC-BY-SA (like MIT/Apache). Honestly, if you use the resources I think no one will come looking for you, but I don't have the right to guarantee this. Edited: Exact license names
@kirillkh If you can pay for licensing, you will have to reach out to Alon Itai at MILA. For a hefty sum, MILA will give you the right to use the lexicon for commercial purposes (like a parser). In any case you will not be granted the right to publish it with an open license.
I've looked at it again. It may seem as if also the Hebrew treebank included, may carry license terms more restrictive that than the library's own license.
Maybe I do not have available some relaxation of the license terms that may seem to apply to that tree bank, or to the updated version of it embedded in this repo.
Apologies for waking up this dead thread.
@matanster I was instructed by my advisor at the time (@rtsarfaty) to publish the treebank and lexicon as part of the github repository for the parser. If anyone wants to know licensing particularities of the treebank and/or lexicon, they can reach out to her or MILA.
Of course!
Hello! This looks like a potentially very useful tool. However, it is unusable as long as its licensing terms are unclear. Please publish the license for BGU Lexicon files and/or at least document their format and how to create alternative data files.