Closed eatyourgreens closed 3 years ago
Recents, boards and discussion pages are built for each of these projects. Users, collections, tags and subjects haven’t been done.
Tags should be done now for each of those projects.
Galaxy Zoo has many broken images from subjects that were hosted in their own S3 bucket, outside of zooniverse-static
.
Just finished auditing Penguin Watch. Two things I've noticed:
Auditing Planet Hunters:
Planet Hunters:
One board is missing: https://talk.planethunters.org/#/boards/BPH0000008
https://talk.planethunters.org/boards/BPH0000008 is present, so I think this is fine.
Subject images missing on new site
Planet Hunters didn't use images as subjects. We decided quite early on that we weren't going to build custom subject viewers for each project. Subjects that can't display an image or a video won't be shown, but there should be a link to the original JSON data, including file locations and metadata for each subject.
Penguin Watch:
https://talk.penguinwatch.org/#/boards - under chat, has “we’re moving to a new slot with the Zooniverse”
https://talk.penguinwatch.org/boards/BPZ000000v is completely missing from the new site. It is listed as 0 posts and 0 discussions on the original site, which might explain why it was skipped. Science Gossip, similarly, has a discussion which is present on the old site but not present in the API responses that are used to build the static pages. https://talk.sciencegossip.org/#/boards/BSC0000003/discussions/DSC0000036
EDIT: removing the check on board.discussions
here fixed the missing Penguin Watch board. I'm not sure of the implications of this for other projects, so I haven't committed that change.
https://github.com/zooniverse/Talk-archiver/blob/b5eb813082970ea249111ec44a3f28ed95160e79/src/helpers/boards.js#L10-L20
Penguin Watch:
https://talk.penguinwatch.org/tags/egg/collections.html - shows no collections
Tagged collections seem to be broken in general. This Galaxy Zoo page is also empty, but should list 6 collections. https://talk.galaxyzoo.org/tags/edgeon/collections.html
EDIT: here's another example of broken collection tags. https://talk.milkywayproject.org/tags/starcluster/
This line should be tag.userCollections
not tag.collections
(collections
is already used by Eleventy.) I don't know how many sites are affected by this bug.
https://github.com/zooniverse/Talk-archiver/blob/b5eb813082970ea249111ec44a3f28ed95160e79/src/tags/userTags/collections.njk#L15
Likely non-blocking issue for Disk Detective: https://github.com/zooniverse/Talk-archiver/issues/76
Approved.
Operation War Diary - I think all of the following are known issues and have been identified previously or in this issue
Collections
Tags
Operation War Diary
https://talk.operationwardiary.org/tags/badscan/ (notes 10 discussions, shows 2)
All 10 are listed on https://talk.operationwardiary.org/tags/badscan/discussions.html but maybe using headings as links was a bad idea? See #16.
All 10 are listed on https://talk.operationwardiary.org/tags/badscan/discussions.html but maybe using headings as links was a bad idea? See #16.
Ah I see now! Hmm, not sure, but I think it's fine as is, I think I just missed it.
All 10 are listed on https://talk.operationwardiary.org/tags/badscan/discussions.html but maybe using headings as links was a bad idea? See #16.
Ah I see now! Hmm, not sure, but I think it's fine as is, I think I just missed it.
@mcbouslog if all your pending comments are resolved - please approve the site and move the site to the approved section in https://github.com/zooniverse/Talk-archiver/issues/56#issue-649834276
I'm not sure if:
Collections
Has been addressed?
I've updated my pending link to this comment, as I think this is the only remaining open item for Operation War Diary.
https://talk.galaxyzoo.org/recent - some pages still have broken images and require new code rerun that uses the direct s3 thumbnail urls vs thumbnail server. see, https://talk.galaxyzoo.org/manifest/hosts.json and the underlying issue with thumbnails https://github.com/zooniverse/thumbnailer/pull/14#issuecomment-672949262
We might be better served by merging the www.galaxyzoo.org/
bucket data into the zooniverse-static/www.galaxyzoo.org/
paths and then using thumbnails / static server to avoid serving data out of s3 in perpetuity but this might be too more effort than required for the task at hand. TDB
I'm seeing all the images load on https://talk.galaxyzoo.org/recent, since rebuilding that page on Friday using #77.
I'm taking on Chimp and See this week. Finally I can prove which one of us is the superior primate.
Comparing build logs: https://talk.spacewarps.org/logs/build.log - Not seeing "users" appear in JSON/HTML output https://talk.spacewarps.org/manifest/build.json - seeing 31,816 users here
Discussions linked to subjects https://talk.spacewarps.org/#/subjects/ASW0008kij - Mentions several linked discussions https://talk.spacewarps.org/subjects/ASW0008kij/ - Doesn't mention any linked discussions
User collections don't sync up https://talk.spacewarps.org/#/users/c_cld https://talk.spacewarps.org/users/c_cld/
Discussions linked to collections https://talk.spacewarps.org/collections/CSWL00000p/ - No discussions linked https://talk.spacewarps.org/#/collections/CSWL00000p - Many discussions linked
Mismatch in discussion count linked to tags https://talk.spacewarps.org/#/search?tags[lens]=true https://talk.spacewarps.org/tags/lens/
Space Warps:
User collections don't sync up https://talk.spacewarps.org/#/users/c_cld https://talk.spacewarps.org/users/c_cld/
Looks like c_cld
was called C_cld
at one point. Collections are matched up by exact matches on username.
https://talk.spacewarps.org/#/collections/CSWS000h5x
https://talk.spacewarps.org/collections/CSWS000h5x/ (links to a C_cld
user profile that doesn't exist.)
Discussions linked to collections https://talk.spacewarps.org/collections/CSWL00000p/ - No discussions linked https://talk.spacewarps.org/#/collections/CSWL00000p - Many discussions linked
That's interesting. I haven't come across the Discussions mentioning this section in any other projects.
Space Warps is using the mentions
feature. We can probably get subject.mentions
from the API responses for subjects. I'm not sure if the collections export included collection.mentions
for each collection.
https://github.com/zooniverse/Talk/blob/2e8ad17390c1d623f1868d078379e73958ff74e4/app/views/focus/discussions.eco#L18-L24
Following up on the Space Warps mentions
feature, I'm looking at the JSON files from the archived site.
Collections are built from the data exports, which don't include mentions. https://talk.spacewarps.org/api/collections/CSWL00000p.json
Subjects are archived directly from the Ouroboros API, and do include mentions (which we ignore in the HTML.) https://talk.spacewarps.org/api/subjects/ASW0008kij.json
Space Warps:
User collections don't sync up https://talk.spacewarps.org/#/users/c_cld https://talk.spacewarps.org/users/c_cld/
I've fixed this, for that account, by keying users by user ID, rather than name. However, I'm now running into a username collision for Space Warps. Two different user IDs are trying to use the same URL.
Output conflict: multiple input files are writing to `dist/api/users/pandamonium2956.json`. Use distinct `permalink` values to resolve this conflict.
I'm seeing all the images load on https://talk.galaxyzoo.org/recent, since rebuilding that page on Friday using #77.
@eatyourgreens excellent
Operation War Diary
- https://talk.operationwardiary.org/ has 21 collections (after "Load More +")
- https://talk.operationwardiary.org/recent/ has 10 collections
If each of those extra 11 collections has been archived to its own page (eg. https://talk.operationwardiary.org/collections/CWDS0000sz/) then this isn't a problem. If any of them are missing, then that would be a problem.
All GZ pages rebuilt to use non s3 hosts, see https://talk.galaxyzoo.org/manifest/hosts.json and https://zooniverse.slack.com/archives/C0138Q1LVCL/p1598041404108100?thread_ts=1598015745.086600&cid=C0138Q1LVCL
Hi! We have seen that videos containing humans have been removed in the static sites. For Chimp&See, this brought the issue that also chimp and gorilla videos has been removed in the case of habituated communities - so, researchers have been seen in the same video as a chimp or gorilla. Is it possible to differentiate here? The problem might be limited to the videos in this collection (as we hopefully tagged these cases all with #habituated). I am just posting here in addition to the respective zooniverse talk thread.
Hi again! Two addition the Chimp&See science team asked me to report:
What would be cool - but is not essential! - is to sort the science board with the chimp matching sites according to their number: https://talk.chimpandsee.org/boards/ :-)
Thanks!
@AnLand we've explicitly dropped the feature discussions mentioning this, see https://github.com/zooniverse/Talk-archiver/issues/80#issuecomment-676577771
Images are missing partly from discussion threads and I am not sure about the pattern, e.g., here with all kinds of different displays, not displays, misses, etc.
Can you please provide URL links that we can review?
What would be cool - but is not essential! - is to sort the science board with the chimp matching sites according to their number: https://talk.chimpandsee.org/boards/ :-)
We won't have time to do this. Our aim here was to turn off our old API infrastructure (save $$$) but archive the volunteer content for posterity.
Noting this effort is not only for Chimp & See but for all our projects that ran on this infrastructure (~36 projects) and I believe we have achieved these aims.
@camallen Thanks for checking - whatever is possible to achieve! Here the link that shows quite well the different image display in one thread: https://talk.chimpandsee.org/boards/BCP000000s/discussions/DCP0001uc1/ Sorry for missing out to include it earlier.
@AnLand those images are now showing, e.g. https://talk.chimpandsee.org/boards/BCP000000s/discussions/DCP0001uc1/
Looks all good to me. Thank you so much!
Hi again, two links to discussion boards are suddenly empty. They worked last week and all other boards seem to be fine. Could you please have a look? Thanks!
It seems that I am able to open discussions within this folder, when I find them via google search.
i'm seeing content for these links above. Perhaps an intermittent issue?
That's interesting. I've been consistently getting a blank page here, but only when I access it on my phone. https://talk.galaxyzoo.org/boards/BGZ0000008/discussions/DGZ0002r39/
I'd thought it was my phone, but maybe there's an issue with caching for these URLs?
I emptied my cache and can now see this page https://talk.chimpandsee.org/boards/BCP000000v/, but now the boards are blank: https://talk.chimpandsee.org/boards/ Sorry, I can't provide any more information.
@AnLand thanks, that's useful to know. If you visit the blank page in a private window, does it still come up empty? Also, if you right click on the blank page and choose View Page Source
, do you get any HTML code at all for the page?
I'd check myself, but https://talk.chimpandsee.org/boards/ loads successfully for me.
You ask for this view view-source:https://talk.chimpandsee.org/boards/, right? No, there is nothing. Just blank.
Thanks, that's exactly what I wanted to know. It sounds like the browser isn't downloading anything at all, not even a partial page.
@AnLand i cannot reproduce this behaviour at all, i see content via https://talk.chimpandsee.org/boards/ and view-source:https://talk.chimpandsee.org/boards/
Can you test again and if it is still not working provide your browser details via https://www.whatismybrowser.com/
I tested again and all seems to be fine now. Thank you! (Sorry for the late response as well.)
these have all been done
Larger projects (~50k subjects or more) that have yet to be fully archived.
Approved
Noting that all pending issues have been dealt with.
Pending
[x] https://talk.planethunters.org/recent (73006 subjects.) - Becky self-assigned 11 Aug 2020 Pending
[x] https://talk.operationwardiary.org/recent (79994 subjects.) - Mark self-assigned 11 Aug 2020 Pending
[x] https://talk.penguinwatch.org/recent (69242 subjects) - Will self assigned Pending, related issue
[x] https://talk.galaxyzoo.org/recent (277604 subjects.) - Cam assigned 13 Aug 2020 Pending, review notes
[x] https://talk.spacewarps.org/recent (191332 subjects.) - Will self-assign 17 Aug 2020 Pending
Using https://docs.google.com/document/d/1sfVy7O-dQK7vgWn10-f9oqNhnh2uKIyUzSe4lKizIEA/edit for reviews