zooniverse / Talk-archiver

A static site generator for old Talk forums, based on elevenpack.
Apache License 2.0
0 stars 1 forks source link

'Discussions Mentioning This' #80

Closed eatyourgreens closed 4 years ago

eatyourgreens commented 4 years ago

Old Talk had a section called 'Discussions Mentioning This' which listed all the mentions for a resource. Here are a few examples:

subject.mentions

https://talk.sciencegossip.org/#/subjects/ASC00001z3 https://talk.chimpandsee.org/#/subjects/ACP00021ev https://talk.spacewarps.org/#/subjects/ASW0008kij

collection.mentions

https://talk.galaxyzoo.org/#/collections/CGZL00003q https://talk.spacewarps.org/#/collections/CSWL00000p

Collection mentions weren't included in the collections export so probably can't be displayed. eg. https://talk.spacewarps.org/api/collections/CSWL00000p.json https://talk.galaxyzoo.org/api/collections/CGZL00003q.json

Subject mentions are included in the archived JSON. eg. https://talk.chimpandsee.org/api/subjects/ACP00021ev.json

camallen commented 4 years ago

rereading through the ouroboros query code and the data model, it appears we would have to

  1. read through all project discussions and for all their comments
  2. collate the comment data (board details) based on their mentioned ids
  3. we would then store the collated board / mention info keyed by the mention id
  4. When we are processing a collection or subject (or other resource) page, we would then lookup the object in mentions (by id) to find the board info to link to for discussions that mention this.

Alternatively it’s on the API response, when we process the collection / subject, can we use the API request data to extract this missing information?

From Jim We have API responses for subjects but not for collections. That should already be mentioned in the GitHub issue. Collections are read from the file system. https://github.com/zooniverse/Talk-archiver/blob/master/src/helpers/collections.js

Sounds like we don't have any api data for the collection so we're stuck with the exports unless that work comes in.

@eatyourgreens How much work would it be to add the API responses for collection and use them to build the 'discussions mentioning this resource' feature?

Pending an answer withing 1/2 days for the API work - i’m inclined to suggest that the manual data linkage is beyond the scope of this archive. If adding the API responses is more than 1/2 days work then i suggest we drop it.

I can find the collection from their mentioning posts, e.g. from https://talk.galaxyzoo.org/boards/BGZ0000007/discussions/DGZ0000v8z/ i can find https://talk.galaxyzoo.org/collections/CGZL00003q/ and https://talk.galaxyzoo.org/collections/CGZS000gux/ not backwards (which old talk had) while not quite the same - we’ve preserved the user generated content for posterity.

eatyourgreens commented 4 years ago

Is there any particular reason we should rebuild everything using the Ouroboros API instead of the local file system to load collections? That sounds like it would take a while. The current code for loading collections would need to be swapped out for code that uses the API instead. Then wait to run all the requests, and archive the responses, since (I think) we have to request collections one-by-one and can't request them in bulk.

eatyourgreens commented 4 years ago

On pages like the following, it is definitely worth checking that each of the linked discussions has a corresponding page in the new site. They should all be there but I'm not sure if the original data export for subjects included every subject that was mentioned (as opposed to just subjects that are the focus of a discussion.) https://talk.galaxyzoo.org/#/collections/CGZL00003q

I think it's entirely possible that someone could have typed an identifier into a comment, which our regex would turn into a link, but there's no corresponding page if that subject had never been collected or commented upon.

eatyourgreens commented 4 years ago

We can get mentions from comment.mentions in individual discussion comments eg. https://talk.galaxyzoo.org/api/discussions/DGZ0000v8z.json

Mentions are uppercase eg. AGZ00016TB but identifiers are case-sensitive: https://talk.galaxyzoo.org/subjects/AGZ00016tb/

I'm guessing the Rails app had some code that mapped those uppercase identifiers to discussion summaries that were rendered here. https://github.com/zooniverse/Talk/blob/2e8ad17390c1d623f1868d078379e73958ff74e4/app/views/focus/discussions.eco#L18-L24

eatyourgreens commented 4 years ago

Here's a list of the number of collections present in the data exports. Each of these, except for the Sellers projects, would have to be requested from the Ouroboros API and the response archived, if we went down that route.

odonnell$ wc -l .data/*_collections.json |sort -n
       4 .data/wisconsin_collections.json
      25 .data/leaf_collections.json
     107 .data/crater_collections.json
     112 .data/orchid_collections.json
     152 .data/notes_from_nature_collections.json
     180 .data/sunspot_collections.json
     244 .data/m83_collections.json
     399 .data/kelp_collections.json
     466 .data/worms_collections.json
     511 .data/galaxy_zoo_starburst_collections.json
     627 .data/illustratedlife_collections.json
     709 .data/bat_detective_collections.json
     770 .data/condor_collections.json
     908 .data/cyclone_center_collections.json
    1189 .data/plankton_collections.json
    1271 .data/chicago_collections.json
    1334 .data/higgs_hunter_collections.json
    1940 .data/wise_collections.json
    1992 .data/andromeda_collections.json
    2255 .data/radio_collections.json
    2403 .data/war_diary_collections.json
    2688 .data/asteroid_collections.json
    3913 .data/milky_way_collections.json
    4027 .data/penguin_collections.json
    4205 .data/sea_floor_collections.json
    6280 .data/chimp_collections.json
   11906 .data/planet_hunter_collections.json
   13132 .data/planet_four_collections.json
   18685 .data/serengeti_collections.json
   41642 .data/galaxy_zoo_collections.json
   93342 .data/spacewarp_collections.json
  217418 total
camallen commented 4 years ago

is it more than 2 days of work to rewrite the code to ensure the ‘Discussions Mentioning This’ resource are all correctly setup? if it is I’d say we avoid doing this but i’ll run it past Chris to get sign off.

chrislintott commented 4 years ago

There is no need to keep this feature.

camallen commented 4 years ago

I'm going to close this issue then - we will progress without backporting this feature into the static archives