zooniverse / Talk-archiver

A static site generator for old Talk forums, based on elevenpack.
Apache License 2.0
0 stars 1 forks source link

Review sites #55

Closed eatyourgreens closed 4 years ago

eatyourgreens commented 4 years ago

Approved

Using https://docs.google.com/document/d/1sfVy7O-dQK7vgWn10-f9oqNhnh2uKIyUzSe4lKizIEA/edit for reviews

eatyourgreens commented 4 years ago

I forgot https://talk.cyclonecenter.org/recent, which is the project that used Groups in Talk.

eatyourgreens commented 4 years ago

Chicago Wildlife Watch was the project that removed subjects for privacy reasons. It would be useful to know if those subjects are still hidden in the archived version. A number of subjects do come up as 404 errors during the build.

camallen commented 4 years ago

https://talk.condorwatch.org/recent :tada: Approved - Cam Reviewed 24 July 2020

eatyourgreens commented 4 years ago

Making each of these live is, I think, a case of renaming /index.html so it isn’t lost, then making a copy of /recents/index.html at /index.html. The static cache may need to be cleared too, to make the changes visible. Jenkins has a job that will do that.

camallen commented 4 years ago

yeah - good thinking Jim. I'll add this note here and we can add a script or docs on the readme if needed. E.g. to enable the static site version for https://talk.condorwatch.org/

DEPLOY_PATH="s3://zooniverse-static/talk.condorwatch.org/"

# preseve the original index file for rollback / posterity
#
# check the cmd looks right via --dryrun flag
aws s3 cp --dryrun "${DEPLOY_PATH}index.html" "${DEPLOY_PATH}old_index_`date '+%Y-%m-%d-%H:%M:%S'`.html"

# run the backup (no --dryrun)
aws s3 cp "${DEPLOY_PATH}index.html" "${DEPLOY_PATH}old_index_`date '+%Y-%m-%d-%H:%M:%S'`.html"

# enable the static version of the site - overwrite the old index file
aws s3 cp "${DEPLOY_PATH}recent/index.html" "${DEPLOY_PATH}index.html"
wgranger commented 4 years ago

Minor notes on Wormwatch Lab:

Besides that, all other content is appear, so happy to move this over to reviewed if this is not an issue.

eatyourgreens commented 4 years ago

That's a good catch. Old Talk has 318 tagged subjects and 3 tagged discussions. The new pages only have the tagged subjects.

EDIT: the tagged discussions all have the hashtag on the new pages, so I think that's fine. eg. https://talk.wormwatchlab.org/boards/BWS0000003/discussions/DWS00001hu/

trouille commented 4 years ago

Reposting here from the slack thread: Notes from Nature review notes — https://docs.google.com/document/d/1mYNeYzyGMz53BVBzMbYtNUaqxO85_Fy_d6xXKtSFGLA/edit?usp=sharing. Same questions as Shaun in his ‘of note’ section in Slack, plus a few minor additions (in red).

No missing / incorrect content. And based on Slack thread - none of these additional minor/cosmetic questions I've flagged are reason to rerun or change the process.

So, happy to have Notes from Nature moved over to reviewed.

eatyourgreens commented 4 years ago

NfN has broken images. From the collection that Laura linked too: Screenshot of a Notes from Nature collection showing broken thumbnails for several of the pages. https://talk.notesfromnature.org/collections/CNNL000006/

trouille commented 4 years ago

It's interesting - that happened to me at first, but then I refreshed the page, and all the images fully loaded.

eatyourgreens commented 4 years ago

Yeah, right click and open image in a new tab loads too eg.

Randomly chosen subject thumbnail from Notes from Nature.

Maybe the page is timing out, or our thumbnail service is timing out.

eatyourgreens commented 4 years ago

Clicking through to each subject shows an image too, so I'm not worried about missing images now.

srallen commented 4 years ago

Cyclone Center Talk generally passed the review checklist except it is missing the group discussions as noted in issue #67 and the search results for the tags are inconsistent. Here are a couple of examples:

I am wondering if this inconsistency is because of the missing groups?

mcbouslog commented 4 years ago

Milky Way Project audit generally good 👍 .

a. Seen similar comments in other reviews, just confirming intentional:

b. Some tags have slight difference between old "Objects" and new "Subjects" count

c. Similar to discussion above on broken images - https://talk.milkywayproject.org/tags/interesting/subjects.html, but each subject (from handful tested) shows image so probably ok

beckyrother commented 4 years ago

Floating Forests seems to redirect to https://www.zooniverse.org/projects/zooniverse/floating-forests

beckyrother commented 4 years ago

Chicago Wildlife Watch looks good to me!

One small comment:

trouille commented 4 years ago

I can't remember if sites did this before this week's rebuild - but is it ok that there appear to be repeated pulls from the same discussions w/in the landing pages (e.g., https://talk.asteroidzoo.org/recent/ - 'not able to comment' board repeated several times, referencing a different post w/in that board). This matches with what's in https://talk.asteroidzoo.org/, so it seems to be what it needs to be; I just don't remember seeing this type of repeats before, so thought I'd flag. (Likely just me forgetting that this is how it is).

https://talk.floatingforests.org/recent/ is what caused me to notice this, since the 'Project still alive' board is referenced so many times in a row.

trouille commented 4 years ago

What should we do about https://talk.floatingforests.org/ not loading and being able to fully review floating forests? As noted in Slack, when we did the work to move FloatingForests to https://www.zooniverse.org/projects/zooniverse/floating-forests, it broke the old Talk.

Note: The https://talk.floatingforests.org/logs/build.log, https://talk.floatingforests.org/manifest/build.json, and https://talk.floatingforests.org/manifest/hosts.json tests (from the review doc) look good.

Random clicking around: https://talk.floatingforests.org/subjects/AKP000mrne/, https://talk.floatingforests.org/subjects/AKP00049ou/ - have broken image links, but others fine (e.g., https://talk.floatingforests.org/subjects/AKP000nlau/)

Not sure if it's the case for all, but if https://talk.floatingforests.org/subjects/AKP0000ccj/ has broken image link, it's also broken in https://talk.floatingforests.org/boards/BKP0000005/discussions/DKP000002k/ . Similarly https://talk.floatingforests.org/subjects/AKP0000ddn/ and https://talk.floatingforests.org/boards/BKP0000005/discussions/DKP000001u/, etc.

BTW, this is so cool: https://talk.floatingforests.org/collections/CKPS000046/ (the person got to classify an image that included their own research lab site).

trouille commented 4 years ago

AsteroidZoo review:

Flagging - https://talk.asteroidzoo.org/manifest/hosts.json points to {"asteroidzoo.s3.amazonaws.com":25818}

Unsure if matters: in https://talk.asteroidzoo.org/logs/build.log - many lines of caching that I hadn't seen in any of the other build logs and more 'Bad response' lines than in others. But the final 'Verifying JSON output' matches up to itself as expected, and matches https://talk.asteroidzoo.org/manifest/build.json. (Note: https://radiotalk.galaxyzoo.org/logs/build.log also has many lines of caching)

What will happen to amazon links (e.g., http://asteroidzoo.s3.amazonaws.com/CSS/703/2012/12Apr01/azoo/01_12APR01_N21022_0001-26-scaled.png) like in https://talk.asteroidzoo.org/boards/BAZ0000003/discussions/DAZ00007m2/? Understandable if those become broken links, just flagging.

Flagging that subjects don't show the 4 multi-images. E.g., https://talk.asteroidzoo.org/subjects/AAZ0000b53/ and https://talk.asteroidzoo.org/#/subjects/AAZ0000b53. Constraint to accept? Or error by mistake?

Otherwise, systematic search and random clicking around - looks good.

trouille commented 4 years ago

Plankton Portal review:

https://talk.planktonportal.org/logs/build.log, https://talk.planktonportal.org/manifest/build.json, https://talk.planktonportal.org/manifest/hosts.json tests (from the review doc) look good.

In the random clicking around, found: Broken link to http://www.planktonportal.org/#/science/field-guide within https://talk.planktonportal.org/boards/BPK0000003/discussions/DPK00000de/ But I don't think we can do anything about that.

Otherwise, systematic search and random clicking around - looks good.

trouille commented 4 years ago

https://radiotalk.galaxyzoo.org/recent Review:

https://radiotalk.galaxyzoo.org/logs/build.log, https://radiotalk.galaxyzoo.org/manifest/build.json, https://radiotalk.galaxyzoo.org/manifest/hosts.json tests (from the review doc) look good.

Minor (I don't think merits a rerun, but still noting): 'Untitled discussion' under 'Science' in https://radiotalk.galaxyzoo.org/ means that there's no link to follow in the parallel spot in https://radiotalk.galaxyzoo.org/recent/.

Similar to wanting to double check about single image vs multi-images for Asteroid Zoo, it is purposeful/known that the former Radio GZoo could scroll through a number of images for a given subject, and the new Radio GZoo cannot? I can imagine yes, this is a known constraint, but want to check. E.g., https://radiotalk.galaxyzoo.org/subjects/ARG00011tc/ vs https://radiotalk.galaxyzoo.org/#/subjects/ARG00011tc

Otherwise, systematic search and random clicking around - looks good.

trouille commented 4 years ago

Galaxy Zoo Quench Review:

Minor: Why is https://quenchtalk.galaxyzoo.org/recent/ labeled 'Galaxy Zoo Starburst' and not 'Galaxy Zoo Quench'? Not reason to rerun, just noting.

Flagging (worth discussion): In a thread like the following: https://quenchtalk.galaxyzoo.org/boards/BGS0000001/discussions/DGS00001xy/ (and many others) there are a lot of links to other threads w/in the same project's Talk; e.g., a link to http://quenchtalk.galaxyzoo.org/#/boards/BGS000000a/discussions/DGS00001xk. Once the old Talk doesn't exist anymore, there will be many broken internal links?

Flagged: it seems for most tags, there are fewer results in the new Talk than in the old Talk; e.g., https://quenchtalk.galaxyzoo.org/#/search?tags[irregular]=true and https://quenchtalk.galaxyzoo.org/tags/irregular/, https://quenchtalk.galaxyzoo.org/tags/merger/ and https://quenchtalk.galaxyzoo.org/#/search?tags[merger]=true, https://quenchtalk.galaxyzoo.org/tags/agn/, https://quenchtalk.galaxyzoo.org/#/search?tags[agn]=true, etc.

Otherwise, systematic search and random clicking around - no other issues/flags.

eatyourgreens commented 4 years ago

Minor: Why is https://quenchtalk.galaxyzoo.org/recent/ labeled 'Galaxy Zoo Starburst' and not 'Galaxy Zoo Quench'? Not reason to rerun, just noting.

Good catch. That's the project's name in Ouroboros.

Similarly, Floating Forests is called Kelp in Ouroboros and OWD is called War Diary.

eatyourgreens commented 4 years ago

Flagged: it seems for most tags, there are fewer results in the new Talk than in the old Talk; e.g., https://quenchtalk.galaxyzoo.org/#/search?tags[irregular]=true and https://quenchtalk.galaxyzoo.org/tags/irregular/, https://quenchtalk.galaxyzoo.org/tags/merger/ and https://quenchtalk.galaxyzoo.org/#/search?tags[merger]=true, https://quenchtalk.galaxyzoo.org/tags/agn/, https://quenchtalk.galaxyzoo.org/#/search?tags[agn]=true, etc.

Randomly checking https://quenchtalk.galaxyzoo.org/tags/irregular/, I'm seeing the same numbers for old and new sites: 50 subjects and 1 collection.

eatyourgreens commented 4 years ago

Flagging (worth discussion): In a thread like the following: https://quenchtalk.galaxyzoo.org/boards/BGS0000001/discussions/DGS00001xy/ (and many others) there are a lot of links to other threads w/in the same project's Talk; e.g., a link to http://quenchtalk.galaxyzoo.org/#/boards/BGS000000a/discussions/DGS00001xk. Once the old Talk doesn't exist anymore, there will be many broken internal links?

Those should still work eg. http://quenchtalk.galaxyzoo.org/recent#/boards/BGS000000a/discussions/DGS00001xk. We can check by making the project live (replacing the old index.html page with the new one.)

eatyourgreens commented 4 years ago

What should we do about https://talk.floatingforests.org/ not loading and being able to fully review floating forests? As noted in Slack, when we did the work to move FloatingForests to https://www.zooniverse.org/projects/zooniverse/floating-forests, it broke the old Talk.

Has anyone contacted us about the old site not being available? If not, this may mean that no one is using it. 😮

trouille commented 4 years ago

Flagged: it seems for most tags, there are fewer results in the new Talk than in the old Talk; e.g., https://quenchtalk.galaxyzoo.org/#/search?tags[irregular]=true and https://quenchtalk.galaxyzoo.org/tags/irregular/, https://quenchtalk.galaxyzoo.org/tags/merger/ and https://quenchtalk.galaxyzoo.org/#/search?tags[merger]=true, https://quenchtalk.galaxyzoo.org/tags/agn/, https://quenchtalk.galaxyzoo.org/#/search?tags[agn]=true, etc.

Randomly checking https://quenchtalk.galaxyzoo.org/tags/irregular/, I'm seeing the same numbers for old and new sites: 50 subjects and 1 collection.

Strange, I see 50 (new) vs 51 (old).

Screen Shot 2020-08-06 at 10 14 50 AM Screen Shot 2020-08-06 at 10 14 42 AM

trouille commented 4 years ago

And for this one, a broader split:

Screen Shot 2020-08-06 at 10 14 29 AM Screen Shot 2020-08-06 at 10 14 22 AM

eatyourgreens commented 4 years ago

How odd. The first one is the same, 50 subjects + 1 collection = 51 search results. The second one has different numbers of subjects: 248 (new) vs. 250 (old.)

@camallen is it worth getting a second pair of eyes on the code that builds those tagged collections? https://github.com/zooniverse/Talk-archiver/blob/b5eb813082970ea249111ec44a3f28ed95160e79/src/helpers/tags.js#L42-L52

camallen commented 4 years ago

How odd. The first one is the same, 50 subjects + 1 collection = 51 search results. The second one has different numbers of subjects: 248 (new) vs. 250 (old.)

@camallen is it worth getting a second pair of eyes on the code that builds those tagged collections?

the code looks good to me. Re-reading the ouroboros source the tag search (and other search) used elastic search (ES) service for results. the data in the main API had to be kept in sync with the ES system and it was a wee bit notorious for failures etc.

While i can't say for sure why these discrepancies exist, i'd take the actual DB data export results (what Jim used to build the tag results) over the ES system results. Considering we're talking about a few tags here and there i think the failure of data syncing between Ouroboros API and ES is the issue here. https://github.com/zooniverse/Ouroboros/blob/5e040dd444d4c9302bee1c13fc5cf35651f2052e/lib/talk_search.rb#L64 https://github.com/zooniverse/Ouroboros/blob/5e040dd444d4c9302bee1c13fc5cf35651f2052e/lib/tasks/build_talk_search.rake

trouille commented 4 years ago

Yes, the Floating Forest researchers and their participants have had occasional uses in the past for that old Talk content (before we broke the link) and so it's good that https://talk.floatingforests.org/recent/ will exist.

https://github.com/zooniverse/Talk-archiver/issues/55#issuecomment-669798416

trouille commented 4 years ago

Summary of outstanding questions from above not yet addressed and/or resolved:

-- Noting that the old https://talk.floatingforests.org/ doesn't load so we can't do the comparison review. Reviewing https://talk.floatingforests.org/recent/ on its own looks good, except most subject images are broken; e.g, https://talk.floatingforests.org/subjects/AKP0000ccj/ and https://talk.floatingforests.org/subjects/AKP0000ddn/

-- Asteroid Zoo

1) Flagging https://talk.asteroidzoo.org/manifest/hosts.json points to {"asteroidzoo.s3.amazonaws.com":25818}.

2) Flagging that subjects don't show the 4 multi-images. E.g., https://talk.asteroidzoo.org/subjects/AAZ0000b53/ and https://talk.asteroidzoo.org/#/subjects/AAZ0000b53. Constraint to accept? Or error by mistake?

Same question for Radio Galaxy Zoo multi-images: E.g., https://radiotalk.galaxyzoo.org/subjects/ARG00011tc/ vs https://radiotalk.galaxyzoo.org/#/subjects/ARG00011tc

3) A question (worth response/clarity in this thread): What will happen to amazon links (e.g., http://asteroidzoo.s3.amazonaws.com/CSS/703/2012/12Apr01/azoo/01_12APR01_N21022_0001-26-scaled.png) like in https://talk.asteroidzoo.org/boards/BAZ0000003/discussions/DAZ00007m2/? Will those still be accessible? Or will they break? Understandable if those become broken links, but would be good to have a response here.

-- Galaxy Zoo Quench:

Flagging (worth response/clarity in this thread): In a thread like the following: https://quenchtalk.galaxyzoo.org/boards/BGS0000001/discussions/DGS00001xy/ (and many others) there are links to other threads w/in the same project's Talk; e.g., a link to http://quenchtalk.galaxyzoo.org/#/boards/BGS000000a/discussions/DGS00001xk. Once the old Talk doesn't exist anymore, will these be broken links? In a project like Quench, there are many internal references, so it'll be many broken links. Understandable if that needs to be the case, but would be good to have a response here.

-- Cyclone Center

https://github.com/zooniverse/Talk-archiver/issues/67 remains unresolved.

eatyourgreens commented 4 years ago

I've checked one of those Floating Forest subjects and Floating Forest images are broken because their URLs redirect to PFE. Here's an example. http://www.floatingforests.org/subjects/53fb88d669736d77dd66b400.jpg

eatyourgreens commented 4 years ago

Old links should still work. See the example in this comment. https://github.com/zooniverse/Talk-archiver/issues/55#issuecomment-669797122

EDIT: here's another example to test the old tag search URL fragments. https://quenchtalk.galaxyzoo.org/recent/#/search?tags[merger]=true Looks like those don't work with URL-encoding but do work if you use the unencoded URL.

https://quenchtalk.galaxyzoo.org/#/search?tags[merger]=true shows me an empty page (no search results) so I think URL-encoding breaks those direct search URLs on the old sites too.

eatyourgreens commented 4 years ago

We made a decision, quite early on, that we wouldn't build custom subject viewers for each project. Instead, each subject has a link to the full subject as JSON, including metadata and all file locations. Here's an example from Disk Detective. Data for a Disk Detective subject in the Firefox JSON viewer

camallen commented 4 years ago

Summary of outstanding questions from above not yet addressed and/or resolved:

-- Noting that the old https://talk.floatingforests.org/ doesn't load so we can't do the comparison review. Reviewing https://talk.floatingforests.org/recent/ on its own looks good, except most subject images are broken; e.g, https://talk.floatingforests.org/subjects/AKP0000ccj/ and https://talk.floatingforests.org/subjects/AKP0000ddn/

The plan is to get a fix for old talk so folks can review this site. Broken images are due to misconfigured web server, these should be working once a fix is in. More details to come.

-- Asteroid Zoo Flagging https://talk.asteroidzoo.org/manifest/hosts.json points to {"asteroidzoo.s3.amazonaws.com":25818}.

These images are hosted by the project teams on s3, we have no control over them so it's their responsibility to keep them online. Noting there are a few projects like this, milkyway is another one.

Flagging that subjects don't show the 4 multi-images. E.g., https://talk.asteroidzoo.org/subjects/AAZ0000b53/ and https://talk.asteroidzoo.org/#/subjects/AAZ0000b53. Constraint to accept? Or error by mistake?

Same question for Radio Galaxy Zoo multi-images: E.g., https://radiotalk.galaxyzoo.org/subjects/ARG00011tc/ vs https://radiotalk.galaxyzoo.org/#/subjects/ARG00011tc

This was an active choice as outlined in https://github.com/zooniverse/Talk-archiver/issues/55#issuecomment-671820668. Each old talk was customized to fit the data of the project, we chose to create a generic image subject placeholder to keep the user content alive and provide exports of the original data via links / files.

A question (worth response/clarity in this thread): What will happen to amazon links (e.g., http://asteroidzoo.s3.amazonaws.com/CSS/703/2012/12Apr01/azoo/01_12APR01_N21022_0001-26-scaled.png) like in https://talk.asteroidzoo.org/boards/BAZ0000003/discussions/DAZ00007m2/? Will those still be accessible? Or will they break? Understandable if those become broken links, but would be good to have a response here.

Any images hosted by us will be migrated and will keep working. Any owned / managed by external teams will keep working as long as they pay the hosting services (AWS s3 in this example) to keep running.

-- Galaxy Zoo Quench: Flagging (worth response/clarity in this thread): In a thread like the following: https://quenchtalk.galaxyzoo.org/boards/BGS0000001/discussions/DGS00001xy/ (and many others) there are links to other threads w/in the same project's Talk; e.g., a link to http://quenchtalk.galaxyzoo.org/#/boards/BGS000000a/discussions/DGS00001xk. Once the old Talk doesn't exist anymore, will these be broken links? In a project like Quench, there are many internal references, so it'll be many broken links. Understandable if that needs to be the case, but would be good to have a response here.

As Jim points out, old links will still work via a redirect once we make the new pages go live.

-- Cyclone Center

67 remains unresolved.

Chris is aware and we've actively made the decision to not do any more work archiving the content on this project. See https://github.com/zooniverse/Talk-archiver/issues/67#issuecomment-661841803

camallen commented 4 years ago

Ok - ouroboros talk floating forests is back online https://talk.floatingforests.org/

trouille commented 4 years ago

Just noting that I did the final review comparing https://talk.floatingforests.org/ and https://talk.floatingforests.org/recent/. Nothing new turned up. Still finding some broken images (e.g.., https://talk.floatingforests.org/subjects/AKP0008ifd/), but that's already been noted above.

Have moved the project to approved.

camallen commented 4 years ago

this is all done - sites are live and working.