kiwix / kiwix-js

Fully portable & lightweight ZIM reader in Javascript
https://www.kiwix.org/
GNU General Public License v3.0
302 stars 126 forks source link

Add a performance test #240

Closed sharun-s closed 1 year ago

sharun-s commented 7 years ago

There seem to be many different routes to optimize the current code base, for increased read speed and render speed. A benchmarking test that produces some clear number, will help to evaluate proposed changes.

This maybe complicated to produce but two values would be ideally returned by such a test-

  1. Total time to read all assets on an article request. This would include lookup content in zim, decompression time
  2. Total time to render the page with all its assets. This is time it takes to inject all assets into UI.

Total load time would be the sum of these two.

I have been manually trying to do this using the chrome debugger by running a request on en wikipedia's Paris page, which produces some 200 odd asset reads and takes 30-40s to complete with the jQuery method. Seems like an ideal target to optimize. But the numbers keep varying a lot randomly so some kind automation might be required.

The basic idea being Gain\Loss from change = Test(old code) - Test(new code) And some way to keep track of this number from build to build.

sharun-s commented 7 years ago

This can also have two other metrics as the bar to reach Total time to load page (eg Paris) from the internet Total time to load page from the kiwix (desktop app)

I am not fluent enough in this stuff so if there are any good javascript projects out there with such perf tests to refer too please leave a link in the thread.

mossroy commented 7 years ago

Using nightwatch is an option (it is already setup, and runs on each commit by Travis on Saucelabs, to run minimal UI testing : see nightwatch.js, nightwatch_runner.js and .travis.yml). It can run a real browser (Firefox is currently used), interact with it, and wait for elements to appear. For each test, it says how much time it took to do it. So, if we manually record that, we can compare before/after an optimization. In this case, I think we have to run it locally (because I suppose the Saucelabs performance can vary). It would be a cheap way to track performance. There are many technical possibilities to automate performance monitoring, and to have more detail on where the time is spent (@julianharty is our expert) but it can be quite time-consuming. I'm not sure it's worth it for now. Regarding performance of the backend, #116 might help, but it's still a work-in-progress.

sharun-s commented 7 years ago

Ok will check out nightwatch.

And the zimlib mentioned in 116 what's the difference between that and current xzdec?

mossroy commented 7 years ago

xzdec is C library to handle XZ compression/decompression of a string libzim is a C library to handle the whole ZIM structure : https://github.com/openzim/libzim If we manage to compile and use it with emscripten, it would replace all the low-level javascript "backend" code with more features and compatibility. It might also be faster (but that's not sure).

sharun-s commented 7 years ago

Ah that's interesting. Would be great if ZIM were using browser supported decompression. This is after all web content being compressed. And the browser guys I am guessing have been fine tuning this stuff for a long time. That would simplify the whole story a whole lot, in terms of performance with all decompression being done by the browser. Maybe one day somewhere in the distant future I'll try building a ZIM with gzip or the newer brotli just to see what happens.

mossroy commented 7 years ago

Changing the compression algorithm should be proposed in https://github.com/openzim/libzim/issues Keep in mind that, in any case, we have to keep compatibility with existing ZIM files. And that kiwix-html5 is not the only ZIM file reader : the other ones are not running decompression inside a browser, and can have other concerns.

sharun-s commented 7 years ago

Yup good points. I was just wondering about it after playing around with XHR RANGE requests. Plus long way to go before I fully get what is possible.

sharun-s commented 7 years ago

Do you guys have access to this WebPageTest or the people who work on it?

Seems like a very nice setup for different platforms/browsers/pages etc. Plus there is saving/graphing changes over time. It mentions a time to firstpaint being logged, which probably (haven't looked into this stuff) is related to nightwatch's wait for element to appear mentioned by @mossroy above.

eustas commented 6 years ago

FYI We are working on standardization of the new content encoding "sbr" = "shared brotli"; basically, this allows using of language-specific dictionaries and LZ77 prefix dictionaries. Our early experiments with wikipedia pages has shown that with topic-specific dictionary, compression ratio is ~2.48x better than gzip -9.

mossroy commented 6 years ago

@eustas : thanks a lot for the info. This is about the compression algorithm used inside ZIM files, so it should be suggested for the OpenZIM format, not for kiwix-js (which is just one client able to read ZIM files). Could you create a new issue in https://github.com/openzim/libzim/issues ?

Currently, LZMA2 is the only supported compression format, see http://www.openzim.org/wiki/ZIM_file_format#Clusters and http://www.openzim.org/wiki/LZMA2_compression But it might be interesting to compare and see if SBR might be supported in the future

kelson42 commented 1 year ago

Challenging this ticket to know if it has aged well. Nothing against measuring the performance, but the key point of the performance is the ZIM reading/searching part. Therefore, maybe this ticket would have a better place, or a new one should be created, in openzin/javascript-libzim.

Jaifroid commented 1 year ago

Agreed.