sergey-tihon / OpenNLP.NET

OpenNLP for .NET
Apache License 2.0
89 stars 18 forks source link

Added .NET Core Support (OpenNLP.NET 1.9.1.2-preview0001) #14

Closed NightOwl888 closed 2 years ago

NightOwl888 commented 2 years ago

This relies on a preview build of IKVM, the first release with .NET Core support (as well as .NET 5+). This IKVM build is definitely not production ready, but we could use some help with testing to get it ready, so it would help if this is released as a preview on NuGet when you get a chance.

I have made a build of OpenNLP.NET and run our 26 tests on macos-latest, ubuntu-latest, and windows-latest (the latter both in x64 and x86) on net48, netcoreapp3.1, net5.0, and net6.0 and they are all green. The 6 OpenNLP.NET tests are also passing on windows for both net461 and netcoreapp3.1, but I am having trouble getting Paket to download anything on macOS in GitHub Actions. I attempted to upgrade Paket, but that didn't change the results.

Do note that there is no official ikvmc version to build from macOS at this point, since the tools rely on native bits that haven't been ported yet to macOS. And it will probably be dumbed down in the future to emulate the way csc.dll works on .NET Core so the current command options may break in a future ikvmc release. I changed the strategy for GitHub Actions to build on Windows and then transfer the binaries to Windows, Linux and macOS to test on.

Also note that the assemblies don't have a TargetFrameworkAttribute so when viewing them in decompilers, the .NET Core assemblies show as .NET Framework 4.0. If you wish to add the attribute, you will need to create .java files that contain annotations using this syntax and then reference the file with the -assemblyattributes option in ikvmc.

There is a bug in Fake that is preventing from releasing a lower version than the last release during the release notes parsing (https://github.com/fsprojects/FAKE/issues/2680), so I added https://github.com/sergey-tihon/OpenNLP.NET/commit/3ba44b58b3e447cec41c53682cf7acbc3a5c6c7f which can be reverted after the release to put the 1.9.3 and 1.9.4 releases back in. I will leave it up to you whether you want to upgrade those now as previews, or wait until IKVM has a stable release.

NightOwl888 commented 2 years ago

Alright, I got all of the tests passing on Windows, macOS and Linux in GitHub Actions after adding a separate stage for testing.

There are still some things that could be optimized. But, since I have limited experience with Paket and Fake, I would prefer that you tackled them, if interested. The entire build/test takes only 15 minutes now, and the optimizations may save about 1/3 of that time.

  1. Dependencies can be pinned to specific frameworks (when I tried it, it failed but maybe there is some trick to it that is beyond my (limited) understanding of Paket)
  2. Model binaries could be embedded resources in the test DLL, so they wouldn't have to be downloaded on each test agent separately.
  3. I don't think that .NET Core 3.1 is required to build, but it is for testing.
sergey-tihon commented 2 years ago

Thank you @NightOwl888! IKVM for .NET Core is a really awesome news.

Do you have use case for targeting netcoreapp3.1? It is close to its end of life already. I personally would prefer to go with net6.0 simplify maintenance and life of future generations ;)

NightOwl888 commented 2 years ago

Do you have use case for targeting netcoreapp3.1? It is close to its end of life already. I personally would prefer to go with net6.0 simplify maintenance and life of future generations ;)

Currently, netcoreapp3.1 is the only target that works with ikvmc. But consuming applications can use any higher version, such as .NET 5, .NET 6, or .NET 7 preview (although the last case hasn't been tested).

image

The future plans are to keep ikvmc but dumb it down like csc.dll in .NET Core, then we will build a higher level tool such as ikvm build that allows to select a target. You would be able to select .net6.0 even if the compiler were targeting netcoreapp3.1, you would just need to point it to the reference assemblies for .net6.0. But the code is still too much of a mess to do that dynamically because of all of the hard-coded #ifdef statements.

Under the hood in .NET, there are actually only 4 different assembly versions and all of .NET Core runs on the same version. This is why I mentioned the TargetFrameworkAttribute above - it can be put into a template that can be added to the ikvmc command so .NET decompilers (such as dotPeek) will see the actual target framework instead of ".NET Framework 4.0", which is what it shows now.

image

Here is an example of an assembly compiled with dotnet build:

image

But the current thinking will be to discourage putting IKVM "ports" like this on NuGet.org and instead package .targets files with <MavenReference> elements in the tools directory of a .nupkg, then have the build tools compile a private build for each consumer. I am skeptical of this approach, but at the same time I can't seem to poke any holes in it. If a consumer references multiple packages that have the same <MavenReference> elements they will still end up with only a single copy of the binaries based on the Maven package version resolution, which will be compiled locally and either distributed with an executable or re-packaged as a .targets file only with no DLLs in a consuming .nupkg when using dotnet pack.

For the time being, since OpenNLP.NET is already in the wild, it makes a good test case until the IKVM tooling is completed.

NightOwl888 commented 2 years ago

@sergey-tihon

BTW - I am curious where you came up with this approach for writing the strong name key as a BLOB. We fixed the -keyfile option in https://github.com/ikvm-revived/ikvm/pull/46, but this appears to have broken the BLOB functionality. However, I couldn't find any documentation that says BLOB was ever a supported option.

sergey-tihon commented 2 years ago

BTW - I am curious where you came up with this approach for writing the strong name key as a BLOB.

I was looking a way to sign unsigned assemblies restored from NuGet, because we had to deploy in GAC. Back in a day I found that it is possible to do using Mono.Cecil by Jb Evans using StrongNaming capabilities

In 2019 during adoption to .NET Core 2.0 they changed API https://github.com/jbevain/cecil/pull/548 and introduced option to load key from BLOB https://github.com/jbevain/cecil/blob/79b43e8e72866f450dee8b59510cb5767b1491ec/Mono.Security.Cryptography/CryptoService.cs#L177-L178

sergey-tihon commented 2 years ago

but we could use some help with testing to get it ready, so it would help if this is released as a preview on NuGet when you get a chance.

Done, package available on NuGet - https://www.nuget.org/packages/OpenNLP.NET/1.9.1.2-preview0001

I am having trouble getting Paket to download anything on macOS in GitHub Actions. I attempted to upgrade Paket, but that didn't change the results.

This is looks resolved for me

Do note that there is no official ikvmc version to build from macOS at this point

This is a little bit unfortunate for me personally, because I work primarily on macOS these days and have hard access to Windows.

package .targets files with https://github.com/ikvm-revived/ikvm/issues/50 in the tools directory of a .nupkg, then have the build tools compile a private build for each consumer.

I am also skeptical, it sounds promising but I am not sure that it will work technically. How would you determine dependencies between jar files and build order?

In this repo, I've tried to use jdeps to infer dependencies between jar files, export them as a graph in *.dot format and then use that graph to recompile them with ikvmc.

But this approach failed on Stanford.NLP.NET) and I still manually maintain dependencies, versions and build order. IIRC, jdeps just failed on folder with Stanford jars

image
NightOwl888 commented 2 years ago

but we could use some help with testing to get it ready, so it would help if this is released as a preview on NuGet when you get a chance.

Done, package available on NuGet - https://www.nuget.org/packages/OpenNLP.NET/1.9.1.2-preview0001

Thanks. It looks like we ended up pushing an update to IKVM to fix some bugs that were reported with resolving class paths.

https://www.nuget.org/packages/IKVM/8.2.0-prerelease0809

I don't have any direct evidence that this patch directly affects OpenNLP APIs, but we have a very small sampling of the tests for OpenNLP, so there could be some parts of it that are not functional without the IKVM patch (and even with it).

I am having trouble getting Paket to download anything on macOS in GitHub Actions. I attempted to upgrade Paket, but that didn't change the results.

This is looks resolved for me

Yes, sorry that comment was already obsolete before I posted the PR.

Do note that there is no official ikvmc version to build from macOS at this point

This is a little bit unfortunate for me personally, because I work primarily on macOS these days and have hard access to Windows.

Actually, it will probably function with the Windows or Linux builds if you run dotnet ikvmc.dll from one of those distributions. The only thing that probably won't work is if the Java app depends on the native bits.

package .targets files with ikvm-revived/ikvm#50 in the tools directory of a .nupkg, then have the build tools compile a private build for each consumer.

I am also skeptical, it sounds promising but I am not sure that it will work technically. How would you determine dependencies between jar files and build order?

In this repo, I've tried to use jdeps to infer dependencies between jar files, export them as a graph in *.dot format and then use that graph to recompile them with ikvmc.

But this approach failed on Stanford.NLP.NET) and I still manually maintain dependencies, versions and build order. IIRC, jdeps just failed on folder with Stanford jars

image

Good info to have. We will definitely need to find an approach that works across Java build tools. Also dependency resolution should work if there are mixed <JavaReference> and <MavenReference> elements in a single project.

<MavenReference> is high-level functionality and there are several steps we need to complete before getting to that design, though.

I am curious what the limitation of jdeps was that prevented it from working on Standford.NLP.NET? Perhaps this is something that would be fixed in a newer version of Java?

Looks like an alternative is JarAnalyzer. Although, the original source URL is offline, it is still available on the Internet Archive.

sergey-tihon commented 2 years ago

I am curious what the limitation of jdeps was that prevented it from working on Standford.NLP.NET? Perhaps this is something that would be fixed in a newer version of Java?

I believe this one - https://issues.apache.org/jira/browse/MJDEPS-15

When I run it on all assemblies I get an error that there are multi-release packages

jaxb-api-2.4.0-b180830.0359.jar is a multi-release jar file but --multi-release option is not set

and with --multi-release it fails on non-multirelease packages

So caller should identify Multi-Release JAR Files manually (at least now) and process them separately. it is still an issue for me on

openjdk 11.0.12 2021-07-20
OpenJDK Runtime Environment Microsoft-25199 (build 11.0.12+7)
OpenJDK 64-Bit Server VM Microsoft-25199 (build 11.0.12+7, mixed mode)
NightOwl888 commented 2 years ago

You might want to check out the latest work on <MavenRefernece> support. We got it to use Maven to download all 37 .jar files for this project. And it is set up to allow a project-level pom.xml to configure private Maven repositories and credentials. Actually, there is just one .targets file that does all the work, which could be imported into a project manually. Although, this approach probably won't work if you want to stick with Paket and Fake.

sergey-tihon commented 2 years ago

@NightOwl888 Am I right that in order to build netcore assemblies from *.jar I need to tun ikvmc from KVM-8.2.0-prerelease.392-tools-netcoreapp3.1-win7-x64.zip?

I found strange code in this PR

With such build process it is not clear what the point to build for 2 targets net461 and netcoreapp3.1 ...

NightOwl888 commented 2 years ago

@NightOwl888 Am I right that in order to build netcore assemblies from *.jar I need to tun ikvmc from KVM-8.2.0-prerelease.392-tools-netcoreapp3.1-win7-x64.zip?

I found strange code in this PR

With such build process it is not clear what the point to build for 2 targets net461 and netcoreapp3.1 ...

You are right. These were not meant to be swapped around. Unfortunately, not having a combined .NET Core executable is a bit confusing.

sergey-tihon commented 2 years ago

I've tried to update to the next release of IKVM and run netcoreapp3.1 build using netcoreapp3.1 version of IKVMC.exe

let ikvmVersion = @"8.2.0-prerelease.809"
let ikvmRootFolder = root </> "paket-files" </> "github.com"

let ikvmcFolder_NetFramework = ikvmRootFolder </> "ikvm-" + ikvmVersion + "-tools-net461-win7-x64"
let ikvmcFolder_NetCore_Windows = ikvmRootFolder </> "ikvm-" + ikvmVersion + "-tools-netcoreapp3.1-win7-x64"

and it fails with

image

currently I am not sure what is the right way to use ikvmc and build assemblies for netcore

NightOwl888 commented 2 years ago

I've tried to update to the next release of IKVM and run netcoreapp3.1 build using netcoreapp3.1 version of IKVMC.exe

let ikvmVersion = @"8.2.0-prerelease.809"
let ikvmRootFolder = root </> "paket-files" </> "github.com"

let ikvmcFolder_NetFramework = ikvmRootFolder </> "ikvm-" + ikvmVersion + "-tools-net461-win7-x64"
let ikvmcFolder_NetCore_Windows = ikvmRootFolder </> "ikvm-" + ikvmVersion + "-tools-netcoreapp3.1-win7-x64"

and it fails with

image

currently I am not sure what is the right way to use ikvmc and build assemblies for netcore

You probably need to use -nostdlib -r:<ikvmtools>/refs/*.dll as mentioned here. Unfortunately, .NET Core is not finding its reference assemblies automatically.

sergey-tihon commented 2 years ago

Close this PR because https://github.com/sergey-tihon/OpenNLP.NET/pull/15 is merged.