zooniverse / Talk-archiver

A static site generator for old Talk forums, based on elevenpack.
Apache License 2.0
0 stars 1 forks source link

Disk Detective collections page missing subject ids on listed subjects #76

Closed srallen closed 4 years ago

srallen commented 4 years ago

This is likely a non-blocking issue, but on the Disk Detective collections page, the subjects displayed nearly look identical, which makes browsing the list nearly useless on the new version of the page. The original Ouroboros Talk page displayed the subject id which aids in distinguishing each subject:

Screen Shot 2020-08-13 at 10 37 52 AM

However, on the archive version that is missing:

Screen Shot 2020-08-13 at 10 37 59 AM

The links work as expected to each subject.

srallen commented 4 years ago

Another non-blocking suggestion: lazy loading for images. This page was pretty slow to load: https://talk.diskdetective.org/tags/multiple/subjects.html

camallen commented 4 years ago

While lazy loading and UI improvements would help - I think we're too far down the track for this improvement now, our focus is on finalizing this conversion and decommissioning the associated API. Our aim was to archive our user generated content for posterity which we've done.

Jim - if you feel like this is something worth while then go for it.

eatyourgreens commented 4 years ago

Lazy loading is implemented already! https://github.com/zooniverse/Talk-archiver/blob/b5eb813082970ea249111ec44a3f28ed95160e79/src/components/subjectImage.js#L3-L5

srallen commented 4 years ago

I'm not sure if that's functioning as expected. The page I linked to, I disabled the cache so it was like first load, and it's requesting way more than what I see visible on the page:

Screen Shot 2020-08-13 at 12 04 34 PM

Screen Shot 2020-08-13 at 12 00 47 PM

It's slow and causing my CPU fan to spin, but as I acknowledged, non-blocking.

eatyourgreens commented 4 years ago

Interesting. I’ve been seeing loading prioritised for images that are in the viewport but haven’t checked how many total requests are made.

We’re likely to see more of this on these larger projects, where lists of subjects can grow into the tens of thousands.

eatyourgreens commented 4 years ago

loading isn’t supported by Safari (or IE) but I could add a polyfill to these larger sites. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#attr-loading

Changes to the JS would mean rebuilding every page, though.

eatyourgreens commented 4 years ago

On this page, in Firefox 79, network requests aren't made for images until you scroll down and bring them into the viewport. I've been assuming that Chrome etc. implement the same spec for lazy loading but haven't checked. https://talk.galaxyzoo.org/collections/CGZS0002cc/

UPDATE: Chrome 84 makes 384 requests on initial load, then defers the rest of the images until you scroll down and bring them into view.

eatyourgreens commented 4 years ago

Re. the original issue, we could make the collection JSON more visible, like we did for subject pages.

I think the trade-off is the usability of collections vs. keeping Ouroboros running longer while we make the changes.

srallen commented 4 years ago

That Galaxy Zoo page isn't enough requests to hit the performance problem I'm seeing. Try the page I originally reported about which is making thousands of requests initially in Chrome and taking nearly 5 minutes to load: https://talk.diskdetective.org/tags/multiple/subjects.html

Firefox handled it as I would expect, only initiating ~300 requests at the start taking only 8 seconds. It seems like a difference in the implementation of the spec?

srallen commented 4 years ago

Resolving either of these, though, as I have said a few times, is non-blocking, so if the goal here is just purely archived static pages, then the goal has been accomplished. An initial 5 minute load time doesn't absolutely stop anyone from accessing the page if they're willing to wait. Feel free to close.

eatyourgreens commented 4 years ago

One of the problems is that Disk Detective uses PNG images, so can't use our thumbnail service. I did some comparisons in Chrome and Firefox, looking at Disk Detective (PNG subjects) and Snapshot Serengeti (JPG subjects.) Chrome makes more requests and is slower (as expected) for Disk Detective. Both browsers request images that are outside the initial viewport on the larger page.

https://talk.diskdetective.org/tags/multiple/subjects.html (30,951 images) Firefox: 62 requests, 1.67 MB downloaded, 5s load time. Chrome: 711 requests, 9.7MB downloaded, 2.6 min load time. Network performance of a Disk Detective page with 30,951 PNG images in Firefox.

Network performance of a Disk Detective page with 30,951 PNG images in Chrome.

https://talk.snapshotserengeti.org/users/maricksu/comments.html (12,569 images) Firefox: 9 requests, 877k, 5s load time Chrome: 46 requests, 1MB , 37s load time Network performance of a Snapshot Serengeti page with 12,569 JPG images in Firefox.

Network performance of a Snapshot Serengeti page with 12,569 JPG images in Chrome.

eatyourgreens commented 4 years ago

Worth noting that the tag, like all resources, is also available as JSON. https://talk.diskdetective.org/api/tags/multiple.json

eatyourgreens commented 4 years ago

Running Lighthouse on one of the large Galaxy Zoo pages (~25k subjects) threw an error I've never seen before, because the page is so large. A screenshot of Lighthouse, showing four red question marks where the scores should be.

eatyourgreens commented 4 years ago

Chimp & See collections seem to max out at 500 subjects eg. https://talk.chimpandsee.org/collections/CCPL0000ie/ (from #73) Is there a limit on collections, which doesn't apply to tags?

camallen commented 4 years ago

Closing this issue for now - if folks want to rebuild the sites with a pollyfill for safari great. Otherwise folks can use a compliant browser to access pages like this. Not great but the content is technically available so our archive goal is achieved