openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
44 stars 4 forks source link

External links of noted.lol do not open properly #413

Open benoit74 opened 6 days ago

benoit74 commented 6 days ago

See e.g. https://library.kiwix.org/content/noted.lol_en_all_2024-10/noted.lol/convert-any-website-into-a-zim-file-zimit/

At the bottom of the article, we have a link to Zimit GitHub repository

This link does not open properly on kiwix-serve. Same behavior observed on Apple and PWA, so this really looks like an issue with wombat intercepting the click event and doing nasty things.

rgaudin commented 6 days ago
urlRewriten:
    - current_url: https://library.kiwix.org/content/noted.lol_en_all_2024-10/noted.lol/convert-any-website-into-a-zim-file-zimit/
    - orig_host: noted.lol
    - orig_scheme: https
    - orig_url: https://noted.lol/convert-any-website-into-a-zim-file-zimit/
    - prefix: https://library.kiwix.org/content/noted.lol_en_all_2024-10/
    - url: https://github.com/openzim/zimit?ref=noted.lol
    - useRel: false
    - mod: undefined
    - doc: undefined
    - finalUrl: https://library.kiwix.org/content/noted.lol_en_all_2024-10/github.com/openzim/zimit%3Fref%3Dnoted.lol
    [wombatSetup.js:2:18356](https://library.kiwix.org/content/noted.lol_en_all_2024-10/_zim_static/wombatSetup.js)
kelson42 commented 6 days ago

lol ;)

benoit74 commented 6 days ago

Indeed !

Original website:

<script type="text/javascript">
    var links = document.querySelectorAll('a');
    links.forEach((link) => {
        var a = new RegExp('/' + window.location.host + '/');
        if(!a.test(link.href)) {
            link.addEventListener('click', (event) => {
                event.preventDefault();
                event.stopPropagation();
                window.open(link.href, '_blank');
            });
        }
    });
</script>

So all links are handled through javascript ... and hence dynamically rewritten even if we decided not to during static rewriting ...

Not sure how we can handle this, after all, we've said that all calls made from javascript must be rewritten for proper operation ...

Should we add a tweak in the static rewriting that can then be seen and used in dynamic rewriting so that we know we've already rewritten the link for sure and we've made the decision to not rewrite it?

Should we add yet another configuration switch to warc2zim to be able to configure when we do not want to inject wombat into a script like this (but it is hard to specify which script we want to ignore since it has no ID?)