kiwix / kiwix-js

Fully portable & lightweight ZIM reader in Javascript
https://www.kiwix.org/
GNU General Public License v3.0
300 stars 124 forks source link

Prompt user to consider whether they trust the source of a ZIM before allowing the user to open it #974

Closed Jaifroid closed 7 months ago

Jaifroid commented 1 year ago

This is a follow-on from #753. Since we cannot fully mitigate against malicious ZIM content, we could instead ask a user whether they trust the source of the ZIM before loading it, and warn them of bad consequences if it contains malicious scripts.

danielzgtg commented 1 year ago

As per https://github.com/kiwix/kiwix-js/issues/753#issuecomment-1455173509 , we would need to either add signing, or at least make sure the warning does not scare the average user.

Jaifroid commented 1 year ago

A simple message prompt: "This is the first time you are opening this archive. Do you trust the source you obtained it from?". If user answers "No", we could show more info, if answer yes, allow it to open. Because a record is kept of each ZIM in the Cache API or indexedDB dataase, we can allow the ZIM to be re-opened without prompt if it has already been opened once.

Maybe @kelson42 has a view.

danielzgtg commented 1 year ago

So this is trusting zims based on name (and not content / hash)? Ok, I suppose that would be sufficient for now.

Jaifroid commented 1 year ago

We can get that SHA256 from download.kiwix.org, e.g. https://download.kiwix.org/zim/zimit/developer.mozilla.org_en_all_2023-02.zim.sha256. I guess we could prompt the user to check the SHA256 and instructions on how to do so. I'm not sure that's feasible for me to automate (it would be a pretty major feature), and it could only be done with online access AFAIK (unless we periodically include a list of ZIMs and known SHAs in the app, but it would be a maintenance headache).

danielzgtg commented 1 year ago

I am very concerned about the performance of that "2023-02.zim.sha256". Let's ignore certificates for now. So I did a benchmark on my fast computer:

$ hyperfine 'sha256sum wiktionary_en_all_maxi_2023-02.zim'
Benchmark 1: sha256sum wiktionary_en_all_maxi_2023-02.zim
  Time (mean ± σ):     19.145 s ±  0.122 s    [User: 18.647 s, System: 0.498 s]
  Range (min … max):   18.904 s … 19.247 s    10 runs

Now imagine how long it will take on a slow computer, and doing that with Wikipedia instead of Wiktionary. I also loaded the entire file into RAM before running it, while Kiwix seems to target low-end devices such as those with 3GB RAM. A Merkle tree is how Android solves this, but it would require additional maintenance infrastructure and a change to the zim standard. We might as well use names for now and only consider hashes if the problem comes up again. The type of user that would need this feature isn't going to wait more than a few seconds. The type of user who would wait will also know how to check the hash themselves.

Jaifroid commented 1 year ago

I completely agree. Calculating hashes for multi-gigabyte files can take quite a few seconds on a fast machine with lots of RAM, and would take an age on Android. So I think a simple prompt to ask the user if they trust the file they are about to open (and an option to turn off this prompt, which could get annoying to experienced users), is what is needed. It's how Visual Studio Code does it when you're opening a new Repository. It asks once (and once only) if you trust it enough to run scripts from it. If you don't trust it, it gives you basic access. The equivalent to basic access in Kiwix JS might be to open a ZIM in JQuery mode (which disables any JS in the ZIM).

kelson42 commented 1 year ago

Two tickets:

Like you see, pretty old feature requests ;)

Jaifroid commented 1 year ago

Reviewing this, while adding a prompt would be pretty easy, I'm also wary about adding yet more clicks to open a ZIM in these extensions, given that we already need to (re-)select the archive each time the app is launched. It already feels clunky, and I don't want to make it even more clunky.

Any other solutions? Can we rely on "security through obscurity"?