kensanata / mastodon-archive

Archive your statuses, favorites and media using the Mastodon API (i.e. login required)
https://alexschroeder.ch/software/Mastodon_Archive
GNU General Public License v3.0
362 stars 33 forks source link

HTML Export pages should lazy load videos and images to be usable as static sites #103

Closed lightnin closed 1 year ago

lightnin commented 1 year ago

HTML Export pages do not appear to lazy load videos and images by default, as tested with Chrome and Librewolf.

Here is a test page where you can see what the network is doing with an archive page: https://www.playingwiththesun.org/timeline/

I suggest the following two changes:

  1. video tags should use the preload attribute, set to "metadata." This should make the browser load the attributes of the video to get the player setup, but not the data itself. (Some browsers apparently try to get the whole video by default, which will result in a lot of wasted bandwidth.) https://www.w3schools.com/tags/att_video_preload.asp

(I'm hoping that will still show a poster frame consisting of the first frame of the video, which is much more useful - and from what I read, it should.)

  1. For images, we can also set the attribute: loading="lazy" in the template, which should do the trick for most modern browsers. https://web.dev/browser-level-image-lazy-loading/

In case it is helpful to have an archive with media to test with, you can download mine here: https://cloud.amosamos.net/s/4w6is5kacxBPykg

kensanata commented 1 year ago

I wonder what loading metadata entails. How much traffic do you see? It seems to me that the client would have to download at least a block of data for every video? Check out ba2be82 and see whether that helps.

lightnin commented 1 year ago

I tested with the same settings as your commit, and things got rather worse. I suspect that the metadata preload for all those videos tries to happen first, and the lazy loading of images gets put behind in the queue. At least on the train wifi, it was quite bad, with most images not even loading after quite a while.

Here's a test just with the video metadata preload: (no image lazy load setting) https://www.playingwiththesun.org/timeline/index-vidmetadata.html

This exposes a few issues: 1) My images are probably all too big, causing a ton of bandwidth usage to download them all even if they are resized by the browser and only shown at the small, correct size. I guess I can manually fix that by shrinking them all down in the archive. But it raises an interesting question for the HTML archive pages process. Should mastodon-archive offer to shrink images as part of the process of making the html archive, or do it automatically, or not at all? I guess it depends on if you feel the html archive should be usable as a static webpage or not.

  1. Even if my images aren't too big, it's probably too much media to try to load all on one page - some 200+ images and videos. The lazyload for video perhaps isn't lazy enough. If I can't figure out how to get things to be truly lazy, and really only load a little bit in advance of the page gets scrolled downward (which I think would take some scripting), then the only alternative is probably pagination of some kind on the html archive generator. Perhaps max 25 toots / page?
lightnin commented 1 year ago

Ah - just realized you have pagination flag built right in. Seems like a promising solution...

kensanata commented 1 year ago

Maybe we need no preloading… and the preview image from the media download. Hm.

lightnin commented 1 year ago

Oh - is there an image of the preview of the video already stored somewhere? If so we can use the poster attribute - that would probably make things faster I guess?

https://www.w3schools.com/tags/att_video_poster.asp

kensanata commented 1 year ago

If you search for "video" in the JSON file, you'll find things like these:

        "media_attachments": [
          {
            "id": 939926,
            "type": "video",
            "url": "https://assets.octodon.social/media_attachments/files/000/939/926/original/fe5c391fe53cf507.mp4",
            "preview_url": "https://assets.octodon.social/media_attachments/files/000/939/926/small/fe5c391fe53cf507.png",
            "remote_url": "https://curate.mastodon.art/gallery/media_attachments/files/000/299/910/original/a9bc2a765634a2bc.mp4",

So url would be the video andpreview_url the poster. Assuming that it got downloaded.

kensanata commented 1 year ago

As for images. Regarding lazy loading for images, MDN says: "Loading is only deferred when JavaScript is enabled. This is an anti-tracking measure, because if a user agent supported lazy loading when scripting is disabled, it would still be possible for a site to track a user's approximate scroll position throughout a session, by strategically placing images in a page's markup such that a server can track how many images are requested and when." – https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img

What I don't understand: Images should use the preview_url for display and the url for click-through, as far as I understand the image_template. So huge image files should work just fine?

kensanata commented 1 year ago

As for videos. e29e793 is a new attempt. This time with no preloading and using the preview_url for the poster attribute. Let me know how it works.

lightnin commented 1 year ago

It looks like my archive just has the video file in the preview field. I wonder if this is a difference between pleroma and mastodon archives? (The archive / json in this case was made with mastodon-archive, but against a pleroma server.)

    "media_attachments": [
        {
          "blurhash": null,
          "description": "",
          "id": 800879,
          "pleroma": {
            "mime_type": "video/mp4"
          },
          "preview_url": "https://masto.amosamos.net/media/eb50da0120f4e95ef9567e5387ec72769be2577a48043d773c81227cd5c06ba0.mp4",
          "remote_url": "https://masto.amosamos.net/media/eb50da0120f4e95ef9567e5387ec72769be2577a48043d773c81227cd5c06ba0.mp4",
          "text_url": "https://masto.amosamos.net/media/eb50da0120f4e95ef9567e5387ec72769be2577a48043d773c81227cd5c06ba0.mp4",
          "type": "video",
          "url": "https://masto.amosamos.net/media/eb50da0120f4e95ef9567e5387ec72769be2577a48043d773c81227cd5c06ba0.mp4"
        }
      ],

It looks like Pleroma archives use the same image for the preview field as well, which explains why it takes so long to load:


 "media_attachments": [
        {
          "blurhash": null,
          "description": "",
          "id": 800869,
          "pleroma": {
            "mime_type": "image/jpeg"
          },
          "preview_url": "https://masto.amosamos.net/media/439299e726f2639681ffe54ffe2cffed87338c19e18f255be33d2eac827abebd.jpg",
          "remote_url": "https://masto.amosamos.net/media/439299e726f2639681ffe54ffe2cffed87338c19e18f255be33d2eac827abebd.jpg",
          "text_url": "https://masto.amosamos.net/media/439299e726f2639681ffe54ffe2cffed87338c19e18f255be33d2eac827abebd.jpg",
          "type": "image",
          "url": "https://masto.amosamos.net/media/439299e726f2639681ffe54ffe2cffed87338c19e18f255be33d2eac827abebd.jpg"
        },
lightnin commented 1 year ago

104 Reverts the use of poster. But now that we've identified as significant difference in a pleroma vs. a mastodon archive, maybe it should stay in, as it might be nice for mastodon? It depends, I guess, on whether you want to officially support compatibility with pleroma (which I guess would require a little debugging to figure out why the preview / posters image urls are all the same as the full-sized media item.)

Either way, I've got a working well-enough solution now, though the CSS fix for images in #100 would be great too! Where should I pay my bounty?

kensanata commented 1 year ago

Yikes. No usable preview is tough. Maybe it should stay in but only if the two URLs are different. I'll try and work something up. The alternative would be tricky: creating our own preview images using an external process that invokes ffmpeg or something like that.

As for the bounty: How about Médecins Sans Frontières, since your profile says you're in Denmark.

kensanata commented 1 year ago

I just checked #100 and it's merged. Is there something that's missing?

kensanata commented 1 year ago

I hope 02626fa works for both Mastodon and Pleroma.

lightnin commented 1 year ago

I'm good to go for my thesis now thanks to you. Thanks so much for your help!!

Screenshot 2023-05-02 at 08 20 58