sergey-tihon / OpenNLP.NET

OpenNLP for .NET
Apache License 2.0
89 stars 18 forks source link

Unable to convert model .bin to .nbin, vice versa #20

Closed vineet-singh26 closed 1 year ago

vineet-singh26 commented 1 year ago

Hi folks, I need some help/ guidance in converting models from https://opennlp.sourceforge.net/models-1.5/ to .nbin format. I am stuck with this. I tried model convertor shared https://www.codeproject.com/articles/12109/statistical-parsing-of-english-sentences?display=print&fid=229482&df=90&mpp=25&sort=Position&view=Normal&spc=Relaxed&fr=101&prof=True but I didn't find any success.

Please, any help here is really appreciative. Thankyou !!

Error: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection JavaBinaryGisModelReader

sergey-tihon commented 1 year ago

What is .nbin and why you decided to convert?

If you use this project (OpenNLP.NET) is perfectly fine to use original models without any conversion. You can check the code samples in tests https://github.com/sergey-tihon/OpenNLP.NET/blob/master/tests/OpenNLP.NET.Tests/Tests.cs#L117-L128

vineet-singh26 commented 1 year ago

I was looking for a POSTagging model when I came across your project. Amazing work you've done here. But the project is dependent on IKVM which we can't introduce into our codebase because of security reasons. I'd have loved to use your project but unfortunately I had to look for alternatives and I came across https://github.com/AlexPoint/OpenNlp It uses .nbin models

can you please help me here @sergey-tihon ?

sergey-tihon commented 1 year ago

I see that you already created issues in another repo https://github.com/AlexPoint/OpenNlp/issues/36

Sorry, I know nothing about .nbin format.

Close this issue because it is not related to this repo.

sergey-tihon commented 1 year ago

You can take a look at models here https://github.com/AlexPoint/OpenNlp/tree/master/Resources/Models

Then maybe check out repo, try to compile, and take a look at the trainers code. Somewhere in the repo should be code that create *.nbin files

sergey-tihon commented 1 year ago

But the project is dependent on IKVM which we can't introduce into our codebase because of security reasons.

Do you know any security vulnerabilities of IKVM? Just wonder what they are about.

vineet-singh26 commented 1 year ago

Hi @sergey-tihon , Thanks for quick reply!!

We get this remark from our security team. "IKVM is a java virtual machine implemented in .NET. Outside of the license risk that seems like a pretty broad attack surface."

we are looking for an MIT based solution. I admired your work with stanford.nlp, but that too used GPL license with IKVM. That's the issue. Thanks !!! If you have some suggestions, that'll be appreciated

wasabii commented 1 year ago

The license risk concern is pretty unfounded, unless there's a general concern with all Java, as all JVMs are under the same license at this point. And is silly if you're using, say, Linux, since the OpenJDK license is less restricted than Linux itself.

It being a JVM and a JVM being big is a potential concern I suppose. More of "the unknown." It's weird because it's not like a bunch of unused classes merely existing that you arent' accessing should be any more of a concern than the thousands of .NET runtime classes that exist that you don't use are.