nwjs / nw.js

Call all Node.js modules directly from DOM/WebWorker and enable a new way of writing applications with all Web technologies.
https://nwjs.io
MIT License
40.3k stars 3.88k forks source link

Critical nw2 regression: Major (10x) slowdown post-0.42.3 (where nw2 is enabled by default--disabling it restores original performance) #7979

Open pd-l2ork opened 1 year ago

pd-l2ork commented 1 year ago

Issue Type

Before opening an issue, please search and see if it has already been raised.

Current/Missing Behavior

Most versions (tried everything between 0.42.4 up to 0.70.1, sometimes skipping a few versions) are significantly (10 or more times) slower than 0.42.3 and lower versions when drawing a busy SVG-based canvas using scripted events. Evidence suggests this is due to nw2 features being enabled by default starting with 0.42.4 onward. Disabling nw2 on newer versions makes performance comparable.

Expected/Proposed Behavior

Speed of running scripted events that populate the canvas should not be significantly different.

Additional Info

You can download it from: http://l2ork.music.vt.edu/main/make-your-own-l2ork/software/ (click on the "Complete (a.k.a. ‘Burrito Supreme’) Installer" section to expand it and download a version for the appropriate platform)

Replacing nw version: On Linux nw version is located in /usr/lib/pd-l2ork/bin/nw folder. On Windows in C:\Program Files (x86)\Pd-L2Ork\bin\nw. Mac is difficult since the files are all mixed, so I discourage testing on that one, even though results are the same. Just replace it with SDK (not runtime) of any other version by renaming that version's folder to nw and replacing this one. Then restart the app.

To open something graphically intense: start Pd-L2Ork (pd-l2ork on command line or double-click on the shortcut). Help (rightmost item in the menubar)->Help Browser->scroll to the bottom and click on L2Ork Tweeter and then again on L2Ork Tweeter (right below the CHANGELOG.txt).

Observe the time it takes to open and populate the window.

As an alternative, one could probably build a simple scripted example that populates the canvas with svg shapes (not sure if the slowdown is tied to strictly SVG, however) and run the same with the latest nw.js with nw2 enabled and disabled and observe the perfromance difference.

pd-l2ork commented 1 year ago

Linked are index.html and package.json that is essentially the snapshot of how the screen looks when it loads.

https://l2ork.music.vt.edu/data/pd-l2ork/test.tar.gz

You can start the same using 0.42.3 and compare it to newer versions.

pd-l2ork commented 1 year ago

The difference is less obvious since there is no background communication (there is a lot of network code going back and forth to draw all this, but it is all identical in both cases). Still you can see the difference. I also spotted a lot of ssl errors on 0.42.3 being printed out in the console, although they do not appear to be causing any problems program-wise as far as I can tell. Could it be that the 0.42.4 and onward use ssl layer for network proxy? If so, could this be the bottleneck?

ayushmanchhabra commented 1 year ago

I'm seeing a black screen for v0.42.3 and v0.42.4 on Ubuntu 20. Not sure how to proceed

pd-l2ork commented 1 year ago

I'm seeing a black screen for v0.42.3 and v0.42.4 on Ubuntu 20. Not sure how to proceed

@ayushmxn if you are running 0.42.4 or 0.42.5 (possibly also 0.42.6) on Linux. It indeed is broken and does not show anything. 0.42.3, however, should work just fine. You can also try 0.69.1.

ayushmanchhabra commented 1 year ago

Could you send a screenshot of what a working example looks like? I'll try across different versions.

ayushmanchhabra commented 1 year ago

v0.42.3 does not work for me. Could be a missing shared library which would explain why it works for you.

[123138:123138:1015/205713.710040:ERROR:sandbox_linux.cc(371)] InitializeSandbox() called with multiple threads in process gpu-process.
[123138:123138:1015/205713.969746:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command

Also wasn't able to get it working for a bunch of versions from 0.36...0.69.1.

pd-l2ork commented 1 year ago

Can you try 0.42.2 or earlier? Just about any lower works fine.

pd-l2ork commented 1 year ago

Here's how the window should look. This is on Win10 with 0.42.3. Running nw . (from the folder where the two files provided in the aforesaid tarball are located). Fonts are off, but should not affect the outcome of the test.

tweeter

ayushmanchhabra commented 1 year ago

I was able to reproduce! There were quite a few changes made in between 0.42.3 and 0.42.4. I'll try to look into it.. Based on what I've seen, I may have a temp workaround. You could have 2 windows - one which shows a temporary splash screen / circular progress on start up until the main app is loaded.

Putting down the diff for my reference: nw diff: https://github.com/nwjs/nw.js/compare/b8c5658...5c1c6cf v8 diff: https://github.com/nwjs/v8/compare/00f437a...76eb2bb

pd-l2ork commented 1 year ago

@ayushmxn great! Please bear in mind that since 0.42.4-0.42.6 are effectively broken (do not display anything on my side), the slowdown may have been introduced post-0.42.6 (since there is no way to confirm that the problem is with 0.42.4, .5 or .6). 0.43.5 was the next one I was able to test successfully (although both 0.43.5 and 0.43.6 segfault on Linux when opening dev tools). 0.44.0 is the first "stable" iteration that does not have any major issues other than the slowdown.

pd-l2ork commented 1 year ago

I was able to reproduce! There were quite a few changes made in between 0.42.3 and 0.42.4. I'll try to look into it.. Based on what I've seen, I may have a temp workaround. You could have 2 windows - one which shows a temporary splash screen / circular progress on start up until the main app is loaded.

Putting down the diff for my reference: nw diff: b8c5658...5c1c6cf v8 diff: nwjs/v8@00f437a...76eb2bb

How would a start-up screen solve the slowness? Or, would this be simply a distraction from the slowness? If so, that will not cut it, because this is a visual programming language and users can create things of varying complexity. While the aforesaid example may be seen as a more complex example, even more complex ones are certainly possible.

ayushmanchhabra commented 1 year ago

How would a start-up screen solve the slowness? Or, would this be simply a distraction from the slowness? If so, that will not cut it, because this is a visual programming language and users can create things of varying complexity. While the aforesaid example may be seen as a more complex example, even more complex ones are certainly possible.

AHh gotcha. I realise I misunderstood the problem when I replied above. Just reread the issue description:

Most versions (tried everything between 0.42.4 up to 0.69.1, sometimes skipping a few versions) are significantly (10 or more times) slower than 0.42.3 and lower versions when drawing a busy SVG-based canvas.

ayushmanchhabra commented 1 year ago

Could you send over a complex example of a *.pd file so I can observe the slowdown? I currently have a NW build running on my local machine which will probably take the whole of today to complete. I should be able to debug this during the coming weekend.

pd-l2ork commented 1 year ago

If you install pd-l2ork as per my first post (it has a deb installer) and follow instructions provided there (also copied below), it will open the very thing that I reduced to that html file above. And there the difference is dramatic:

You can download it from: http://l2ork.music.vt.edu/main/make-your-own-l2ork/software/ (click on the "Complete (a.k.a. ‘Burrito Supreme’) Installer" section to expand it and download a version for the appropriate platform).

Replacing nw version: On Linux nw version is located in /usr/lib/pd-l2ork/bin/nw folder. On Windows in C:\Program Files (x86)\Pd-L2Ork\bin\nw. Mac is difficult since the files are all mixed, so I discourage testing on that one, even though results are the same. Just replace it with SDK (not runtime) of any other version by renaming that version's folder to nw and replacing this one. Then restart the app.

To open something graphically intense: start Pd-L2Ork (pd-l2ork on command line or double-click on the shortcut). Go to Help (rightmost item in the menubar)->Help Browser->scroll to the bottom and click on L2Ork Tweeter and then again on L2Ork Tweeter (right below the CHANGELOG.txt).

pd-l2ork commented 1 year ago

@tharatau I was able to change the way my app works to be able to record the profile inside dev tools before the patch opens. Below are 2 profile files that I was able to save, one for 0.42.3 and one for 0.70.1:

https://l2ork.music.vt.edu/data/pd-l2ork/Profile-20221106T222201-0.42.3.json https://l2ork.music.vt.edu/data/pd-l2ork/Profile-20221106T221257-0.70.1.json

Both run exactly the same code on my end. The only difference is the use of a different nw.js version (SDK in both cases).

I would really appreciate your assistance at dissecting what is causing these major slowdowns as I have zero experience with the profile option and this is currently a major issue preventing me from fixing crashes on OSX on 0.42.3 and earlier by upgrading to 0.60.1 or newer that fix this issue because anything post 0.42.3 that does run without crashing is ~10 times slower than older versions. Thank you.

P.S. Forgot to add, happy to run other profiling options, if needed. Also, as you can see just from the file size, while 0.42.3 has generated 11K lines in the profile file, the 0.70.1 has over 152K (!). No wonder the second is so much slower... Now, onto the questions why and, once the source is located, is there a way to make 0.70.1. run as efficiently as 0.42.3?

pd-l2ork commented 1 year ago

@tharatau -- another update, having looked at the profile tab in the dev tools, it appears the major difference stems from the "scripting" aspect. This is the one that makes this huge difference in load times between the two. Whereas 0.42.3 takes barely a fraction of a second to process all the scripting, on 0.70.1 the same takes up 9 seconds (!). Yet, both versions use the same exact scripts and therefore the number of calls, at least from pd-l2ork's perspective. Not sure what else nw.js does on its end that has made such a profound difference...

pd-l2ork commented 1 year ago

@tharatau -- yet another update. 0.43.0-beta1 is still fast. 0.43.0-0.43.4 I cannot run because those versions have a bug with running eval functions. 0.43.5 runs (although opening dev tools crash nwjs) and is slow. So, whatever slowed things down, has happened between 0.43.0-beta1 and 0.43.x (where x is 5 or less). Unfortunately both 0.43.5 and 0.43.6 segfault when opening dev tools (at least on Linux), so the earliest profile comparison may have to be between 0.43.0-beta1 and possibly 0.44.0 which is the closest next version that can be used to compare things using dev tools and does not have the eval issue that prevents the patch from loading.

pd-l2ork commented 1 year ago

@tharatau Found it! While before doing this would generate an error, now it finally works. If I disable nw2 by passing the argument --disable-features=nw2 using 0.70.1, the slowdown is gone. So, this is clearly a part of the nw2 introduction. Now onto figuring out what is causing such colossal slowdown (and this is also where I need to step back because I have no knowledge of the nw1 vs nw2)...

ayushmanchhabra commented 1 year ago

@pd-l2ork Apologies for the late response. That's awesome. I'm not too sure how nw1 differs from nw2 also.

cc: #7230 @rogerwang

pd-l2ork commented 1 year ago

@tharatau @rogerwang May I suggest that you do not obsolete nw1 until this major slowdown is resolved because it is significant (10x slower in nw2). Thank you for your consideration.

pd-l2ork commented 1 year ago

@rogerwang to simplify troubleshooting, you can also simply use nw.js 0.67.1 (newer versions have broken window close using nw1, see #7981 ) and change the flag in the package.json that enables or disables nw2 to observe the dramatic change in performance when loading the aforesaid L2Ork Tweeter patch. Please let me know what OS you run and I will make you a build of pd-l2ork for your platform to easily troubleshoot this.

Folder locations for different OS installs: Linux: /usr/lib/pd-l2ork/bin/package.json OSX: /Applications/Pd-L2Ork.app/Contents/Resources/app.nw/package.json Windows: C:/Program Files (x86)/Pd-L2Ork/bin/package.json

pd-l2ork commented 1 year ago

@tharatau please add the nw2 label to this issue. Thanks.

rogerwang commented 1 year ago

I tried your sample under nw1 and nw2 with the latest version, but I don't notice any difference...

pd-l2ork commented 1 year ago

I tried your sample under nw1 and nw2 with the latest version, but I don't notice any difference...

@rogerwang that is because it is a scripting performance issue (not a drawing performance issue) as noted in my comment above. Inside the app all these are procedurally created using scripting whereas the min repro that was requested failed to do this because it is all predrawn (as in a static webpage). You should try the pd-l2ork app instead. See the two profile files I uploaded for comparison.

pd-l2ork commented 1 year ago

@rogerwang what OS are you using? I can prepare a version of the app that is easy to test and show the slowdown, including the ability to profile...

rogerwang commented 1 year ago

Please provide a minimal case to reproduce, not a whole application...

pd-l2ork commented 1 year ago

@rogerwang this is a complex application that does pretty much everything nwjs can, from networking, to scripting, drawing, creating windows, and interacting. It clearly exhibits a huge slowdown when everything remains the same except for toggling the nw2 feature. How would I go about narrowing down the source of the problem? Is there a way to use profiling tool to be able to do this?

rogerwang commented 1 year ago

You can try using the devtools provided by Chromium, and insert code in various steps in your application to monitor the performance e.g. printing timestamps.

pd-l2ork commented 1 year ago

@rogerwang so, I added bunch of timestamps in hope of identifying one major slowdown. Instead, I got everything inside js files executing slower in nw2 mode than nw1. When there are a few calls, this is not noticeable, but as the number of calls increases, the overall execution time deviates. As an example, below are two files executing same select functions with the only difference being one disabling nw2 in the package.json file. I have not timestamped all functions (that would take quite an effort given the code is thousands of lines long), but this gives a nice cross-section. The log files insert =======\n at any point where there is a jump larger than 10ms. Every start of the function is prepended by a ">". Note that some functions are nested in others. After executing little over 450 functions, nw2 is at 3848ms whereas nw1 is at 1484ms. All times shown are in ms. Both are using 0.67.1 due to #7981 bug in nw1 mode in newer versions which makes the comparison difficult. That said, the behavior is exactly the same on any version that has nw2 enabled by default (IIRC 0.42.4 onward).

Is there a chance that there is some debugging or logging stuff running somewhere in the background in the nw2 mode that is causing this trickling slowdown across the board?

nw2-disabled-log nw2-enabled-log

agraef commented 1 month ago

With #7230 looming large, I thought I'd chime in here with a few remarks, because the issue affects me, too. nw.js is an awesome project, but if the JS performance issue can't be fixed, and NW1 mode goes away, then some projects will likely be stuck with nw.js 0.67.1 forever, as that's the last version not suffering from #7981. That's a real bummer. :(

I'm having the same issue as @pd-l2ork reported with Jonathan Wilke's Purr Data. (Which is pd-l2ork's sister project, so this isn't surprising.) There's a lot of JavaScript execution going on in that app, you really can't demonstrate the issue with a minimal example. I have a little patch (basically a Pd object written in Lua which draws a graphic animation inside an svg element in an nw.js window) which demonstrates the issue.

Here's how the patch looks like in action (the gif looks a bit sluggish, but with NW2 disabled the animation really looks a lot smoother in real life):

osci3d-example

The Lua script does a lot of calls into JavaScript via C code in the Pd engine to modify the svg element in the patch window in real-time (basically using eval to evaluate simple Java function calls). So the JS interpreter has its work cut out. It still works fine with NW2 disabled, but with NW2 the entire application becomes sluggisch because it spends so much time in the JS interpreter.

Here's what the dev tool performance monitor gives with NW2 disabled: nw1-js-performance

And here's the graph with NW2 enabled: nw2-js-performance

As you can see, the time spent in JS execution went up by almost an order of magnitude, like @pd-l2ork observed. The graphics rendering itself is fine (I originally thought that was the issue). Maybe eval being used for dynamic JS code execution is the issue here? Whatever the reason is, NW is much slower in this use case.

Now I have to admit that this is very far from your run-of-the-mill use case for a browser engine. :) But with NW1 it certainly works extremely well, and a lot of work went into doing that GUI port (upstream Pd is still using Tcl/Tk for its GUI, which is much less advanced). So I really hope that the JS execution issues can be sorted out in NW2, and that NW1 mode is kept around until that time, so that we can continue using newer nw.js versions than what is now possible.

I'm afraid that it will be hard to replicate the issue in an example with just a basic nw.js app, but I can post download links for Purr Data and the patch example, if someone is still willing to look into this issue.

agraef commented 4 weeks ago

@rogerwang On second thought, it ought to be possible to demonstrate this kind of major slowdown in the JS performance from NW1 to NW2 with a simpler test app which dynamically modifies a document with some svg graphics, using a relatively high framerate, and does lots of dynamic JS code execution via eval to do it. That's basically the gist of what our examples here do.

AFAICT you don't do this kind of performance measurement in your existing test framework, but the effect should be readily noticable by just running the same app with NW1 and NW2, respectively.

I'll try to cobble something together as soon as I find the time. If there is such a major degradation in JS performance, you surely want to fix it before phasing out NW1, no? ;-) As you know the sources better than anyone else, maybe you can also think of a reason why the switch from NW1 to NW2 could have such an impact on the JS interpreter?

rogerwang commented 4 weeks ago

Unfortunately the NW1 mode was deprecated long time ago. It's unmaintained and could break at any time...

agraef commented 4 weeks ago

@rogerwang, No-one wants to drag along legacy code forever, I understand that. :)

But that means that we're stuck between a rock and a hard place, as long as the JS performance issue we observe remains unfixed. Are you willing to look into it, if I provide a smaller example demonstrating it?

rogerwang commented 4 weeks ago

Are you willing to look into it, if I provide a smaller example demonstrating it?

I asked for that 2 years ago ... https://github.com/nwjs/nw.js/issues/7979#issuecomment-1309142267

agraef commented 4 weeks ago

Yes, I understand, but I didn't actually notice the issue (or this bug report) myself until very recently. Ok, I'll give it a go then. Thanks for your reply.