Closed eatyourgreens closed 4 years ago
rereading through the ouroboros query code and the data model, it appears we would have to
Alternatively it’s on the API response, when we process the collection / subject, can we use the API request data to extract this missing information?
From Jim We have API responses for subjects but not for collections. That should already be mentioned in the GitHub issue. Collections are read from the file system. https://github.com/zooniverse/Talk-archiver/blob/master/src/helpers/collections.js
Sounds like we don't have any api data for the collection so we're stuck with the exports unless that work comes in.
@eatyourgreens How much work would it be to add the API responses for collection and use them to build the 'discussions mentioning this resource' feature?
Pending an answer withing 1/2 days for the API work - i’m inclined to suggest that the manual data linkage is beyond the scope of this archive. If adding the API responses is more than 1/2 days work then i suggest we drop it.
I can find the collection from their mentioning posts, e.g. from https://talk.galaxyzoo.org/boards/BGZ0000007/discussions/DGZ0000v8z/ i can find https://talk.galaxyzoo.org/collections/CGZL00003q/ and https://talk.galaxyzoo.org/collections/CGZS000gux/ not backwards (which old talk had) while not quite the same - we’ve preserved the user generated content for posterity.
Is there any particular reason we should rebuild everything using the Ouroboros API instead of the local file system to load collections? That sounds like it would take a while. The current code for loading collections would need to be swapped out for code that uses the API instead. Then wait to run all the requests, and archive the responses, since (I think) we have to request collections one-by-one and can't request them in bulk.
On pages like the following, it is definitely worth checking that each of the linked discussions has a corresponding page in the new site. They should all be there but I'm not sure if the original data export for subjects included every subject that was mentioned (as opposed to just subjects that are the focus of a discussion.) https://talk.galaxyzoo.org/#/collections/CGZL00003q
I think it's entirely possible that someone could have typed an identifier into a comment, which our regex would turn into a link, but there's no corresponding page if that subject had never been collected or commented upon.
We can get mentions from comment.mentions
in individual discussion comments eg. https://talk.galaxyzoo.org/api/discussions/DGZ0000v8z.json
Mentions are uppercase eg. AGZ00016TB
but identifiers are case-sensitive: https://talk.galaxyzoo.org/subjects/AGZ00016tb/
I'm guessing the Rails app had some code that mapped those uppercase identifiers to discussion summaries that were rendered here. https://github.com/zooniverse/Talk/blob/2e8ad17390c1d623f1868d078379e73958ff74e4/app/views/focus/discussions.eco#L18-L24
Here's a list of the number of collections present in the data exports. Each of these, except for the Sellers projects, would have to be requested from the Ouroboros API and the response archived, if we went down that route.
odonnell$ wc -l .data/*_collections.json |sort -n
4 .data/wisconsin_collections.json
25 .data/leaf_collections.json
107 .data/crater_collections.json
112 .data/orchid_collections.json
152 .data/notes_from_nature_collections.json
180 .data/sunspot_collections.json
244 .data/m83_collections.json
399 .data/kelp_collections.json
466 .data/worms_collections.json
511 .data/galaxy_zoo_starburst_collections.json
627 .data/illustratedlife_collections.json
709 .data/bat_detective_collections.json
770 .data/condor_collections.json
908 .data/cyclone_center_collections.json
1189 .data/plankton_collections.json
1271 .data/chicago_collections.json
1334 .data/higgs_hunter_collections.json
1940 .data/wise_collections.json
1992 .data/andromeda_collections.json
2255 .data/radio_collections.json
2403 .data/war_diary_collections.json
2688 .data/asteroid_collections.json
3913 .data/milky_way_collections.json
4027 .data/penguin_collections.json
4205 .data/sea_floor_collections.json
6280 .data/chimp_collections.json
11906 .data/planet_hunter_collections.json
13132 .data/planet_four_collections.json
18685 .data/serengeti_collections.json
41642 .data/galaxy_zoo_collections.json
93342 .data/spacewarp_collections.json
217418 total
is it more than 2 days of work to rewrite the code to ensure the ‘Discussions Mentioning This’ resource are all correctly setup? if it is I’d say we avoid doing this but i’ll run it past Chris to get sign off.
There is no need to keep this feature.
I'm going to close this issue then - we will progress without backporting this feature into the static archives
Old Talk had a section called 'Discussions Mentioning This' which listed all the mentions for a resource. Here are a few examples:
subject.mentions
https://talk.sciencegossip.org/#/subjects/ASC00001z3 https://talk.chimpandsee.org/#/subjects/ACP00021ev https://talk.spacewarps.org/#/subjects/ASW0008kij
collection.mentions
https://talk.galaxyzoo.org/#/collections/CGZL00003q https://talk.spacewarps.org/#/collections/CSWL00000p
Collection mentions weren't included in the collections export so probably can't be displayed. eg. https://talk.spacewarps.org/api/collections/CSWL00000p.json https://talk.galaxyzoo.org/api/collections/CGZL00003q.json
Subject mentions are included in the archived JSON. eg. https://talk.chimpandsee.org/api/subjects/ACP00021ev.json