kiwix / kiwix-js

Fully portable & lightweight ZIM reader in Javascript
https://www.kiwix.org/
GNU General Public License v3.0
295 stars 124 forks source link

On serializing the archive object #275

Open sharun-s opened 7 years ago

sharun-s commented 7 years ago

I have added some code that I wrote while thinking about #240 to load the archive from a url . It is on the perftest branch.

So for example the url file:///c:/kiwix/www/index.html?archive=wikidump.zim&title=Paris.html&mode=xhr would cause an article load without having to go through the config page/search/button press etc.

It helps removing the dependency on using the file-selector to create the object every time. setRemoteArchive would be a good option but it downloads the whole zim as a blob. Which is a issue for testing large zims.

All that is needed to create the archive object is the pointer values in the zim header. With those numbers the object can be then easily 'stringified' by replacing the File object with {name: name of zim, size:size of zim}. Have added a stringifyArchive() to app.js. Also added some constructor code in zimarchive, zimfile, zimarchiveloader that can recreate the object from that string.

Since there is no File object XHR Range requests are used instead for reads. And name and size are enough for the xhr based read op. Require.config provides a way to load the util module exposing only file or xhr read slice depending on the URL param mode=file|xhr.

The limitation is the zim file must be placed in www directory and it works only on chrome with local file access flag set.

mossroy commented 7 years ago

Interesting.

Did you try to run that inside a webextension (for Firefox or Chrome)? If it works, it might be useful to deploy some "custom apps" like they do on Android (for example https://play.google.com/store/apps/details?id=org.kiwix.kiwixcustomwikimed). It's basically "Kiwix + a ZIM file" packaged together, and ready to use. These custom apps have been very popular on Google play, but it can only be done with relatively small ZIM files

sharun-s commented 7 years ago

Haven't tried extensions. If this worked on Firefox it would have been perfect, as Firefox allows xhr (file://) access to the file without any flag being set like Chrome. Problem is the size of the ZIM. If its a large ZIM it tries to read the entire thing into memory even when trying a XHR RANGE request for just some N bytes. Filled a bug If it ever gets fixed. it would do away with the dependency on fileselector/file object creation. I am using it mostly now for quick testing single page loads on Chrome.

mossroy commented 7 years ago

When inside a Firefox extension, the URI starts with moz-extension:// For Chrome, it's chrome-extension:// In both cases, the security rules are different than what you have with file://

From what I tested today :

As you created the issue on bugzilla, I suppose you managed to make it work on Firefox, at some point?

sharun-s commented 7 years ago

Interesting and good to know whats happening with extensions.

@mossroy try it out with a different zim file. I have noticed that data.subarray error on Chrome when the zim file (wikipedia_en_all_2016-12.zim) doesn't have a default main page set, i.e. there is some error in the main page dirent returned.

I usually run chrome with the debugger option "pause on uncaught exception" checked, with the wikipedia_en_all_2016-12.zim and it always get stuck at that line.

Interesting to me that Firefox extensions throw an error but Firefox itself ignores it.

The is a subtle bug. For small ZIM's Firefox will return quickly and you wont notice things. But for large ZIMs Firefox takes forever to return. If you use process explorer or top you can see the firefox process stays busy till the return happens. Probably seeking to end of file. A month or so ago it was reading the whole file into memory and my whole machine would freeze.

mossroy commented 7 years ago

The wikipedia_en_all_2016-12.zim is much too big for this test. I had used wikem_en_all_2017-06.zim for the tests above, which does have a working main page.

sharun-s commented 7 years ago

Mozilla provided a workaround. This works quite nicely now on Firefox. Tested with workaround here

Following kinds of standalone urls will work without requiring going through the fileselector

file:///c:/kiwix/www/index.html?archive=wiki.zim&title=Paris.html&mode=xhr file:///c:/kiwix/www/index.html?archive=wiki.zim&titleSearch=Paris&mode=xhr file:///c:/kiwix/www/index.html?archive=wiki.zim&imageSearch=Paris&mode=xhr

In the commit linked the stringified archive object is hardcoded to wikipedia_en_all_2016-12.zim in app.js. To load whatever other zim use the stringifyArchive function (after loading via fileselector and print it out) and just replace that string.

This now opens the possibility of saving/reloading lastopened zim from cookie or session storage or wherever. Bookmarking direct links to articles, searches etc. Supporting open link in new tab.

Another interesting door it opens is allowing app to auto detect "known" ZIM's present in the local directory. The library.xml from the kiwix desktop app has this list which provides the name of all possible zims. This list can be presented to the user. On selecting one, app would do a quick xhr request for the header of the filename. If it returns, file is present and can be loaded. Switching ZIM's would also be very simple and straight forward. If not present open browser with link on kiwix site for download OR start bittorrent client with magnet link. These links are I think available in the library.xml

The local directory can be thought of as the default ZIM storage directory. Currently it would be whereever app.js is but this can be played with by shifting some of the js code into the index file.

To load anything outside the local directory, app would have to fallback to fileselector approach. Anyone who has the free time to do this go for it.

mossroy commented 7 years ago

I opened #292 for the idea of "custom apps", that your work makes possible. I've set it to a later milestone.