hicommonwealth / commonwealth

A platform for decentralized communities
https://commonwealth.im
GNU General Public License v3.0
66 stars 40 forks source link

Spike: infra + architecture research for self-serve discourse import #6450

Closed jnaviask closed 3 months ago

jnaviask commented 5 months ago

Background

Discourse is a partially competing forum product to Commonwealth. Many onboarding communities often come from Discourse, and to support them in their migration, we created a tool to allow them to import their existing corpus of forum posts into the Commonwealth app.

This required the joining community to present a database dump to the Commonwealth team, which we would use via a series of manual steps (more automated on airplane per #5819 but still requiring direct intervention) to populate forum data in their newly created Commonwealth community.

This script has not been used in months and is almost certainly broken at this point, and would fail and/or crash if we attempted to use it now. [Update: thanks to the great work of Timothee and Ryan, we now have a working import script again.]

We would like to continue supporting this feature, however, and ideally make it usable by anyone who wants to create a community on Commonwealth, without requiring direct team intervention.

Description of Task

Determine what changes need to be made at the platform level to support self-serve discourse import and evaluate how much effort it will be to implement those changes. UI changes are out of scope (likely minor).

Timebox

3 hours.

Technical Details

The code for discourse import is currently in a separate repository (ask if you need access). The code is deployed on a cw-discourse-import app on Heroku (ask if you need access). Usage documentation is present in the codebase, but I will detail the high-level here of the existing flow:

  1. User presents Common team with a database dump from Discourse.
  2. Common team loads the database dump into S3, and then from S3 into a Heroku database, typically an additional (unused) db on one of our staging deployments (frick/frack).
  3. Discourse import app exposes an import route script which, given the source (Discourse) and target (Commonwealth) dbs, performs ETL logic (described in more detail below) that transforms Discourse's format into Commonwealth's, and imports user and forum data to the newly created community.
  4. Discourse import route sends email to specified Commonwealth administrators containing success / fail logs, and on success, a dump of the import results.
  5. Discourse import app exposes an email script which sends emails to all Discourse users asking them to migrate to Commonwealth. Those users can then log in and "claim" their Commonwealth identity (explained below).

The forum data transformation is fairly straightforward conceptually, but I want to take time to elaborate on points (3) and (5) above, as concerns the user data transformation and claiming logic:

None of this logic is set in stone, for the scope of your research -- any element can be changed so long as a counterproposal is made that justifies the change and resolves the fundamental issue of web2 to web3 account / content-ownership model transformation. For the purposes of this project, however, it may be best to determine scope to bring the script back to working order and tack on self-service as a feature, rather than consider a full rearchitect.

Rotorsoft commented 5 months ago

@rbennettcw is already engaged in restoring functionality to this project. However, in the long term, it might be good to explore a more straightforward SQL-to-SQL backend integration script, given that this is a one-time backfilling process that we initiate and oversee internally.

Rotorsoft commented 5 months ago

Started this Miro model with the current ETL process: https://miro.com/app/board/uXjVNeKJpE8=/?moveToWidget=3458764578530428713&cot=14

Rotorsoft commented 5 months ago

Created integration model - Model in Miro, listing automation opportunities at different stages in the ETL process.

Next Steps: