update4j / update4j

Create your own auto-update framework
Apache License 2.0
791 stars 73 forks source link

Crash in javafx.web native code when loaded dynamically #83

Closed nemphys closed 3 years ago

nemphys commented 4 years ago

Maintainer's Note

If you bump into this issue, this will happen when javafx.web is dynamically loaded in the business application after config.launch(). This is a bug in the JavaFX web engine; you can track progress here.

Possible workarounds are:


Original Post

As the title says, whenever my business application tries to use a WebView from javafx.web, the application crashes producing a hs_err_pid_xxx.log thread dump file. The javafx-web module is loaded dynamically in the module path by update4j (along with the rest of the javafx modules, which have no issues whatsoever). I know update4j is to blame, since everything works fine if I launch the exact same business application directly. I am running OpenJDK 13.0.1+9 with JavaFX 15-ea+1 on a Mac. Any ideas?

mordechaim commented 4 years ago

I experienced similar issues as I briefly mentioned here and looks like an issue with the ModuleLayer code (part of java.base, outside my control). Does it also happen when everything is loaded on the classpath instead?

nemphys commented 4 years ago

I have no idea about the classpath. Please note that I changed my setup (compared to what it was in #81) and now the bootstrap app is not a javafx application, therefore javafx is only initialized by update4j when it launches the business application. Therefore, I don't think it has anything to do with your suggestion in #81 (empty modules, etc).

mordechaim commented 4 years ago

Simply flip the modulepath=true with classpath=true in the config and try again. Not that I suggest you not to use modules, just trying to find the culprit.

nemphys commented 4 years ago

If I do that, the application will not start (it is a fairly complex application with 5 of its own modules and ~50 external dependencies). I am relatively new in Java and started coding in the post 9 era, so I only know how to work with the module path. Have you seen a working example with javafx-web anywhere or am I the first to try it?

nemphys commented 4 years ago

The error is "No main class property found at key 'default.launcher.main.class'".

I think it will be truly difficult to test with the classpath, since my build configuration is very complicated (on top of everything, I am using the JLink Badass plugin in order to produce images). The slightest change means building the whole thing from scratch, since it is not possible to test update4j in the IDE (I am using gradle and the jar structure is very different from what is in the configuration during development).

As for the module path thing, since this is going to be a very big project (already working on it for a year and still a long way to go) and will be relevant for many years, I decided I should go with the latest and greatest.

The only thing I can think of is to create a tiny (= more manageable) new javafx program with a webview (something like the sample you provide) and see how it goes from there. EDIT: Is the source code of the demo app available anywhere?

mordechaim commented 4 years ago

The error is "No main class property found at key 'default.launcher.main.class'".

The reason for that is since we find the launcher by looking at the provides directive on modular jars, but META-INF/services for non-modular jars (check the wiki for more information). By doing this simple change you should be able to get it to work (again, just for testing purposes).

Is the source code of the demo app available anywhere?

Yes, demo-bootstrap and demo-business. Please note, currently the config generation is manual instead of automatically on each build. I'm still working to automate it and restructure them to a Maven multi-module project instead of 2 separate projects.

nemphys commented 4 years ago

OK, I just tried setting default.launcher.main.class in the configuration but then I start having crashes due to other issues (apparently some resources cannot be located due to the classpath change).

I still think this is going to be too hard to debug like this, and I prefer to rule out the possibility that it is a common javafx-web issue (and not somehow specifically related to my case) first.

mordechaim commented 4 years ago

Btw if you want to stay away of update4j's dynamic loading but still benefit of the update functionality, you might let the Launcher class call a (OS dependent) script that starts a new JVM.

See how RuneChanger did it.

nemphys commented 4 years ago

OK, just created 2 small apps, one bootstrap similar to the one I am using in my application and one dead-simple javafx application with a webview. Same crash, regardless of modulepath/classpath setting.

nemphys commented 4 years ago

As for the alternative you proposed, I could launch the app using a script in the Delegate (I wonder why one would chose to do that in the Launcher).

Since I wanted to be able to update javafx itself, I opted for the following configuration:

The Delegate is a very simple class with just the main method, which executes Update.finalizeUpdate and then reads the configuration from disk and launches the business application.

The business application (javafx), which is the Launcher, then fetches any new configuration during the init() phase (ui is just a javafx preloader at the time), saves it to disk and downloads any update it finds using updateTemp().

If an update is downloaded, the application restarts before it shows the main UI and the Delegate finalizes the update.

This way, I can only have the update4j and delegate modules in the bootstrap directory, and all other modules in the business directory (including javafx).

It works like a charm (except for the javafx.web part) and the UX is quite good, plus it has the added benefit of being able to check for updates in intervals while the application is running and perform them silently (with a subsequent restart request).

mordechaim commented 4 years ago

javafxports/openjdk-jfx#316 looks like the exact issue you're having, even people complaining that it works from command line but not when starting with custom classloader.

mordechaim commented 4 years ago

One drawback of using a script instead of dynamic launching is losing the dependency injection communication between bootstrap and business apps.

One more possible solution to that JavaFX web bug (which as I commented earlier, is a known bug) is using the DynamicClassLoader (in the not yet published version of update4j) as the custom system class loader. Refer to #75 for more info.

Your architecture sounds impressive, and goes in-line with my thesis of being totally un-opinionated and let the developer decide for themselves.

I wonder why one would chose to do that in the Launcher

Since they use the DefaultBootstrap and can't change anything, they opted to create a stub launcher that just starts the script and exits.

and downloads any update it finds using updateTemp().

Just to give you the heads up, I plan to deprecate config::updateTemp and the finalize methods in favor of an archive, as discussed in #76. I'm still working on it so will take a while before releasing.

nemphys commented 4 years ago

Just tested with the DynamicClassLoader and it seems to fix the problem in the test apps.

My business application still crashes, though, must investigate further.

I suppose we are in the right track, will keep you posted!

nemphys commented 4 years ago

As for the updateTemp(), I don't mind the archive part, but I just hope the logic does not change after the deprecation, otherwise I will have to refactor again.

nemphys commented 4 years ago

After a little further testing, the problem is fixed only when using the classpath (my main business app uses just the modulepath, as stated above).

When I switch the test apps to use the modulepath, it crashes with the same exception.

After reading through javafxports/openjdk-jfx#316, it seems that it is indeed the exact problem I am facing, but the information there does not provide any clear solution (--add-modules is supposed to be handled correctly by update4j).

I think https://github.com/javafxports/openjdk-jfx/issues/316#issuecomment-457912843 is the real culprit and I am wondering if there is any way to mitigate this in update4j.

EDIT: Please note that it does not seem to be caused by "a mismatch between the Java modules and the native libraries, which are getting pulled from the wrong location", as stated in the final comment of the above issue. I tried using the JavaFX JDK instead of the maven builds, which has the native libraries as separate files and they are clearly (using -Djavafx.verbose=true) loaded correctly.

mordechaim commented 4 years ago

DynamicClassLoader is rather a workaround than a solution.

It seems obvious to me that web view for some reason must be loaded on the system classloader (which might very likely be a bug). When loading on the JVM from command line they all get loaded on the system class path or modulepath correctly, but when we need dynamic loading it gets tricky.

The default configuration of update4j creates a new classloader for each call to launch. While DynamicClassLoader when used as the system loader augments to the system classloader instead (that class has other uses too, which will be documented in the wiki once released).

Since ModuleLayer internally creates its own loader, and we cannot extend the system modulepath, we get this result: default loading configuration will fail on both paths, while using dynamic loader as system loader will only solve classpath but not modulepath.

nemphys commented 4 years ago

Hmmm, this doesn't sound very promising.

So, does this mean we are out of solutions?

mordechaim commented 4 years ago

Perhaps this means we have to open a bug report with the openjfx team. That issue previously linked is from javafxports the which is not the same.

Of course, starting a new JVM will work for you, as you said earlier.

mordechaim commented 4 years ago

One more solution, load JavaFX in the bootstrap app and start it in the class path, from there your business app can be totally on the modulepath and you don't have any --add-exports issues and your real app is still fully modular.

The only drawback, the module calling the JavaFX code must be an automatic module, since it cannot express an explicit dependency on the class path.

nemphys commented 4 years ago

Perhaps this means we have to open a bug report with the openjfx team. That issue previously linked is from javafxports the which is not the same.

Indeed, but https://github.com/javafxports/openjdk-jfx/issues/316#issuecomment-457912843 concludes that the bug (which, after more research is the exact bug I am facing), is not a JavaFX bug (?)

Of course, starting a new JVM will work for you, as you said earlier.

I am currently not considering this approach, since I am trying to achieve the fastest startup time I can get and it is now already being slowed down by the delegate (which has no UI).

One more solution, load JavaFX in the bootstrap app and start it in the class path, from there your business app can be totally on the modulepath and you don't have any --add-exports issues and your real app is still fully modular.

...but I suppose this way lose the ability to update JavaFX

nemphys commented 4 years ago

Could this https://github.com/javafxports/openjdk-jfx/issues/220 be related? If so, there is already an open bug: https://bugs.openjdk.java.net/browse/JDK-8231869

Since I have already spent almost a week on integrating update4j in my project, it would be a great shame to ditch the whole thing because of this. If we find the root cause, I am even considering using a local build of javafx.web until this is (ever) fixed upstream.

mordechaim commented 4 years ago

but I suppose this way lose the ability to update JavaFX

So to solve this, start update4j itself in a script (perhaps invoked by the user) which is itself managed by update4j, so when you want to update JavaFX download the new versions next to the old version and update the script too to only load the new files. This approach also means you can drop the delegate approach and use your own main, and to update the bootstrap you just update the script to point to your new file.

This approach is mentioned in the wiki (look how we handled updating update4j itslef). It gives you the ultimate control over the files the JVM sees, and you no longer need the delegate indirection.

nemphys commented 4 years ago

This is getting much more complicated than I was hoping. Maybe for a small project it would be fine, but a big project with all this back-and-forth just to launch would become a nightmare to maintain (imagine having to debug class loading issues later on).

Can you please elaborate a little more on the following, so that maybe I can get a better grasp on the underlying cause?

DynamicClassLoader is rather a workaround than a solution.

It seems obvious to me that web view for some reason must be loaded on the system classloader (which might very likely be a bug). When loading on the JVM from command line they all get loaded on the system class path or modulepath correctly, but when we need dynamic loading it gets tricky.

The default configuration of update4j creates a new classloader for each call to launch. While DynamicClassLoader when used as the system loader augments to the system classloader instead (that class has other uses too, which will be documented in the wiki once released).

Since ModuleLayer internally creates its own loader, and we cannot extend the system modulepath, we get this result: default loading configuration will fail on both paths, while using dynamic loader as system loader will only solve classpath but not modulepath.

nemphys commented 4 years ago

I think I am getting it, and it seems to be indeed a JavaFX bug, see https://stackoverflow.com/questions/44953914/which-class-loader-is-used-by-jni-calls-in-native-code/45228100#45228100

mordechaim commented 4 years ago

Look at the e8defdd that added DynamicClassLoader.

So to add the jars dynamically, we first collect a list of all jars marked with classpath=true and create a new URLClassLoader and set the Thread.currentThread().getContextClassLoader() (which defaults to ClassLoader.getSystemClassLoader() if not manually set by Thread.currentThread().setContextClassLoader()) as its parent, this enables you to access bootstrap classes in the business app.

We then collect all jar marked with modulepath=true and create a new layer (which creates a new classloader internally) and pass the newly created url classloader as the parent to the layer. This enables us to access the boot and dynamic unnamed module in automatic modules in the layer.

The problem with this approach was that the developer could not control the classloader, as it is created inside the launch method and released before leaving that method. Additionally, reflection frameworks would not always know how to find the dynamic classes, and required workarounds which I won't tire you to explain.

The solution to this was to allow extend the system classloader instead of creating a child loader and attach it to the system. This way, new classes will automatically be visible to ClassLoader.getSystemClassLoader() and reflection frameworks. If it is set as the thread context loader instead of the system loader it will still augment to that loader instead of creating a new loader in launch. This gives you a reference to the loader outside launch().

While this works beautifully for the classpath, the modulepath has a different fate, since I can only add layers, which in turn means, the boot layer will still not see the children.

mordechaim commented 4 years ago

https://github.com/eclipse/rt.equinox.framework/blob/master/bundles/org.eclipse.osgi/container/src/org/eclipse/osgi/internal/hookregistry/ClassLoaderHook.java

Would try to look into this.

nemphys commented 4 years ago

I think we are out of luck. After a lot of reading, I think I finally got the full picture.

The problem is that, as stated here: https://bugs.openjdk.java.net/browse/JDK-8061972 (Class Loaders section), in the old days

The application class loader, an instance of java.net.URLClassLoader, loads classes from the class path and is installed as the system class loader unless an alternate system loader is specified via the system property java.system.class.loader.

whereas in the module system days

The application class loader is no longer an instance of URLClassLoader but, rather, of an internal class. It is the default loader for named modules that are neither Java SE nor JDK modules.

Combined with the fact that

FindClass (JNI) locates the class loader associated with the current native method; that is, the class loader of the class that declared the native method <<< This is where the crash occurs

it means that when the javafx.web native code tries to communicate with Java classes, it defaults to the (new, internal) system class loader, which has no clue of our module layer modules and therefore returns nothing.

This can seemingly only be fixed in the JavaFX native code, and apparently not very easily (found this http://stackoverflow.com/a/16302771/1046167)

As for posting a bug in the javafx bug tracker, good luck with that. The only thing that is not requested is a urine sample by mail :-)

mordechaim commented 4 years ago

Lol!!

I really think you should consider the business app should be launched by a script called in the bootstrap or launcher. It's very straightforward and easy to update (just update the script itself via update4j). The only drawbacks I could think of are losing access to the bootstrap classes in the business app, and dependency injection from bootstrap into business.

Really, it has been done already by other users of this library.

mordechaim commented 4 years ago

Just wondering, since update4j loads javafx.web shouldn't it mean the module layer is the class loader associated with the native method?

nemphys commented 4 years ago

Right, I just read this https://bugs.openjdk.java.net/browse/JDK-8188052. In the first comment it is stated that

  1. A JNI call from a declared native method - which uses the loader of the class that defines the method
  2. A JNI call "through the Invocation Interface" - In which case the system loader is used.

Since the crashing call is made using the Invocation Interface... you know the rest.

So, the only solution would be to change the javafx webkit code to somehow use 1) from above; don't think that will ever happen.

What I am considering is ditching the delegate altogether on normal application executions (so that the startup time is unaffected and even better), do the updates using updateTemp() in the main app as I am already doing and then performing a "special" restart of the application, which will be running some kind of script with what the delegate is now doing. Launching 2 jvms on every application launch would kill startup performance (for reference, I have spent quite some time examining all methods to increase normal startup performance in the past; I am a performance freak).

mordechaim commented 4 years ago

I'd suggest renaming the title to "when loaded dynamically", since it's happens when using the class path too (unless you use the DynamicClassLoader as system loader which satisfies 2) above).

mordechaim commented 4 years ago

I don't really understand how the "special script" works, isn't that effectively launching a second JVM?

mordechaim commented 4 years ago

I am a performance freak

Ooh! Can you suggest anything on my codebase?

nemphys commented 4 years ago

I don't really understand how the "special script" works, isn't that effectively launching a second JVM?

Yes, but this won't happen on every launch, just after a successful update download.

Ooh! Can you suggest anything on my codebase?

I don't think there is much to do, since it seems to be pretty damn fast (the speed difference on startup time was almost unnoticeable compared to a normal app load).

Still, I am banging my head with this. It was too much effort and it is always frustrating when a problem cannot be solved.

mordechaim commented 4 years ago

Yes, but this won't happen on every launch, just after a successful update download.

I can look at it like this: you app consists of just the business app and that is always launched by command line etc. and always checks if there are any temp updates. If a temp update is found it finalizes it and immediately restarts. Otherwise it continues with the business application launch.

Somewhere at a delay, we check for updates on some background thread on the very same business app, and downloads them.

This way we never call config.launch and only start 2 JVMs on the restart

mordechaim commented 4 years ago

Oh, it won't work. The JVM will load everything on launch, so temp update finalization has no benefit.

nemphys commented 4 years ago

Oh, it won't work. The JVM will load the everything on launch, so temp update finalization has no benefit.

Right :-)

mordechaim commented 4 years ago

We could still do something very similar, by checking in the script if the temp directory has files and launch a totally different application that just finalizes, and then the script proceeds to launch the business app

nemphys commented 4 years ago

Yes, that would work.

The only reason I am trying to avoid the startup script solution is that in my build process, after making platform-specific jlink images, I am building a macOS app with jpackage and a Windows executable with launch4j (just a launcher, not wrapping anything). Both of these do not support wrapping a custom startup script; I can only change their jvm runtime arguments.

I can always do the packaging in a different way, but first I want to examine every other possible solution.

nemphys commented 4 years ago

Right now I am building javafx.web from source with some quick modifications, just to see if the fix is easy or not :-)

mordechaim commented 4 years ago

You could do the finalize check after shutdown, in a script, instead.

mordechaim commented 4 years ago

You can then take it to the next level. Namely, you have 2 config files, one for the business app and the other for the finalizer. When the business app does it's update things it first does a regular update to the finalizer app followed by a temp update for the business app. At shutdown, if we had a successful temp update (Update.containsUpdate(dir) == true) we start finalizer with the ProcessBuilder class, we don't even need an external file for this. We now have both apps updating each other.

If the finalizer runs in its own jlink image we can even have the apps update each other's images since they all are totally independent.

nemphys commented 4 years ago

I already spent some time implementing something like this (running the finalize after shutdown) and then I saw your message :-)

I am currently testing it, using a shutdownHook on the main application. My only concern is if the main jvm will have finished shutting down when the secondary (update) jvm will be overwriting the main app's files. I might have to use a small artificial delay there.

Your second proposal rocks, though using a separate jre image for the two components would be an overkill and bloat the application size too much (it already is ~160MB)

mordechaim commented 4 years ago

https://github.com/update4j/update4j/blob/master/src/main/java/org/update4j/util/FileUtils.java#L237

Check out how I did a delayed process.

nemphys commented 4 years ago

Thanx, the ping usage is ingenious (is there really no other way to wait in Windows?)!

mordechaim commented 4 years ago

https://stackoverflow.com/questions/1672338/how-to-sleep-for-five-seconds-in-a-batch-file-cmd/1672349#1672349

Windows does have the timeout command but only works when run in command line, and not in non-interactive environments.

mordechaim commented 4 years ago

I'm already thinking of adding a archive.installOnShutdown() for #76 that does just that with an embedded script, you can then also update the image itself from one JVM.

nemphys commented 4 years ago

This sounds great! In my case, this would render the delegate useless and would save me the hassle of maintaining 2 separate executables.

Just hoping that you plan to implement this soon, since I am currently dealing with the last parts of the whole update functionality implementation and would like to move forward with the rest of the coding.

mordechaim commented 4 years ago

It still has one issue, I can't recover properly from failed moves, since the script runs outside the JVM, thus trading in app consistency (all or nothing to avoid breaking app on failures) for convenience.

I'll do my research to mitigate this issue. Perhaps I can port all safety checks to the script (like file locking, and stop the operation if locked) and urge the user to check on startup if the archive was successfully installed on last run.

nemphys commented 4 years ago

Any updates on this (the archive.installOnShutdown(), that is)?

I am facing some issues with my current setup and I am in the verge of forking update4j to implement something similar.

If you plan to release it soon, it would save me the effort