publiclab / plots2

a collaborative knowledge-exchange platform in Rails; we welcome first-time contributors! :balloon:
https://publiclab.org
GNU General Public License v3.0
958 stars 1.83k forks source link

Planning issue for import of Google Groups to db Nodes #3305

Open jywarren opened 6 years ago

jywarren commented 6 years ago

This is a doozy, so watch out! Long term project, no immediate action needed.

We have (and can generate anytime) a full export of Google Groups content to .mbox format using Google Takeout. It's all public information except peoples email addresses.

Someday, we may want to import all of these as nodes, back-dated using the timestamp data, to make them searchable in PublicLab.org. This might involve several challenges:

  1. matching email addresses to usernames where possible
  2. displaying an alert that these were auto-imported from Google Groups, with a link to original URL
  3. ability to display "users" for each email address that does NOT have a matching user account
  4. whether to forward comment responses to these legacy nodes to everyone in that discussion using the old emails
  5. how to display a thread -- initial post as a node, then all responses as comments?
  6. how to ensure "reply back quoted text" is not displayed since it'll be disruptive (similar to reply by email filtering)
  7. how to actually run the import script using mbox data - maybe via https://github.com/darthbatman/mbox-json plus a Ruby script?
  8. do a test run of just one to see how it looks
  9. what tags to use automatically per-list?

I'm sure there's more. This is a starting list.

ebarry commented 6 years ago

For instance, might this look like ....

an email thread on plots-waterquality is logged as a back-dated research note authored by the original poster, titled with the former email subject line? And all responses might appear as comments on the research note? Tagged with water-quality ?

jywarren commented 6 years ago

yeah, if we can coordinate all of these moving pieces and test it out, that would be the idea!

On Mon, Sep 10, 2018 at 9:30 AM Liz Barry notifications@github.com wrote:

For instance, might this look like ....

an email thread on plots-waterquality is logged as a back-dated research note authored by the original poster, titled with the former email subject line? And all responses might appear as comments on the research note? Tagged with water-quality ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/publiclab/plots2/issues/3305#issuecomment-419912402, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ0oSqI6pZeUbhJuBGH_oyDNhkdQjks5uZmlngaJpZM4WfIQq .