hicommonwealth / commonwealth

A platform for decentralized communities
https://commonwealth.im
GNU General Public License v3.0
65 stars 38 forks source link

[STEP 4] Sign-in: One Signer, One Account #5078

Closed zakhap closed 5 months ago

zakhap commented 8 months ago

Description

Domain Story: As a user with one Signer (that may have two different display addresses), I want to be able to sign in without thinking about the encoded address, I always want to logged into one Account.

User Story: Signing into Common with any address that shares a Signer (shares a pubkey) should sign the user into the Same Account.

Impacts: Cosmos + Substrate power users

Known Unknown: How many addresses in our db share a pubkey but do not share an Account?

Blocked by:

Engineering Requirements

Acceptance Criteria

Additional context

jnaviask commented 8 months ago

@mhagel please consult with @timolegros and @ianrowan on this ticket + #5079

mhagel commented 8 months ago

Signing into common from any Cosmos community with the same signer should (...)

Looks like this AC is unfinished? @zakhap

Edit: Nevermind - "Signing into Common with any address that shares a Signer (shares a pubkey) should sign the user into the Same Account."

mhagel commented 8 months ago

Per Jake, probably best to put hex identifier column on the Addresses table. Then user_id from existing addresses sharing a hex can be applied to new cosmos addresses.

mhagel commented 8 months ago

If this ticket is focused on sign-in, we will create a separate ticket for migrating existing addresses. There will be an overhanging question before we can finish it though:

_Which userid to we consolidate the addresses to?

Easiest answer would be: the oldest.

However, some users may have grown to prefer a user_id associated with a Profile they use more often, so there is not an obvious answer. It is likely some power users would not be happy with whatever we decide.

It is possible we might not want to do a mass migration (1200+ power users) if we can't anticipate which user_id to consolidate to. In this case, we would want to create a UX that asks a power user if they want to consolidate to the account they are signed into.

Any thoughts on whether the next ticket should be: a. a mass migration, or b. an on-demand user-UI-based process?

@jnaviask @zakhap ?

mhagel commented 8 months ago

Latest update: I have a migration that consolidates users into their most recent last_active Address's user_id and profile_id.

After the migration, I find 1678 unique pubkeys (users identified by "hex"), using this query:

SELECT hex, array_agg(address) AS addresses
FROM "Addresses"
WHERE user_id IS NOT NULL
GROUP BY hex
HAVING COUNT(DISTINCT address) >= 2;

This is still a proof of concept. The migration is resource-intensive, takes a long time locally (~10 minutes?), has unknown impacts, and would need lots of testing.

Anecdotally, for my personal accounts, it seems to work as intended. All my disparate addresses get consolidated to a single signer. The main concern here is still my previous comment..

We could do a migration like this. It would work to consolidate addresses into unified profiles. But what would be the community response? Would anybody be annoyed if they get consolidated to a profile they didn't expect?

To think this through, we may have to try some different scenarios in Frick with several of us trying before and after migration. Luckily, we can be sure that all forum activity will be preserved, because those are stored on an address level. The only impacted data would be Profile: profile_name, bio, avatar, email, website, background_image, socials, slug.

One scenario: a profile name on threads in Community A gets changed unexpectedly. Would users mind? Also, users may have to redo all their profile fields, mentioned above. Maybe we consolidate to a less-used email address.

@jnaviask @zakhap

timolegros commented 8 months ago

Latest update: I have a migration that consolidates users into their most recent last_active Address's user_id and profile_id.

After the migration, I find 1678 unique pubkeys (users identified by "hex"), using this query:

SELECT hex, array_agg(address) AS addresses
FROM "Addresses"
WHERE user_id IS NOT NULL
GROUP BY hex
HAVING COUNT(DISTINCT address) >= 2;

This is still a proof of concept. The migration is resource-intensive, takes a long time locally (~10 minutes?), has unknown impacts, and would need lots of testing.

Anecdotally, for my personal accounts, it seems to work as intended. All my disparate addresses get consolidated to a single signer. The main concern here is still my previous comment..

We could do a migration like this. It would work to consolidate addresses into unified profiles. But what would be the community response? Would anybody be annoyed if they get consolidated to a profile they didn't expect?

To think this through, we may have to try some different scenarios in Frick with several of us trying before and after migration. Luckily, we can be sure that all forum activity will be preserved, because those are stored on an address level. The only impacted data would be Profile: profile_name, bio, avatar, email, website, background_image, socials, slug.

One scenario: a profile name on threads in Community A gets changed unexpectedly. Would users mind? Also, users may have to redo all their profile fields, mentioned above. Maybe we consolidate to a less-used email address.

@jnaviask @zakhap

My opinion here is that profile data should be consolidated based on what's available rather than just the most recently used address (if we don't ask users which is the ideal but hardest option). For example, if a user's most recently active address has an associated profile that doesn't have a bio then we should fetch the next most recently used address that actually has a bio and only leave the new bio blank if none of the associated profiles have a bio. If multiple profiles have a bio defined use the most recently used address.

A migration that takes 10 minutes locally is almost guaranteed to fail in production so will definitely need improvements there. This is my rough take on what an expand-contract approach could look like - it ensures we can revert at each step and we don't accidentally destroy everyone's profiles without being able to revert without a full DB reset:

Generating hex values

  1. Create new column for hex on address table
  2. Add stored procedure which generates hex value for each new cosmos address
  3. Backfill hex values outside of migration as background job with batch processing (if this is indeed what is taking so long)

This step is reversible because you can simply drop the hex column. This does not update sign-in procedures (users still login with addresses and not public key).

Profile merging

  1. Create new column for master profile on address table
  2. Consolidate all profile data for Cosmos users into a new master profile
  3. Populate the master profile column -> for cosmos address this is a new profile and for others it is the same as the old profile
  4. Update all app queries to use the new master profile column (can be aliased to avoid any naming refactors or create a view)

This again, does not update the login procedures (users still sign-in with addresses and not public key) but they now have their new profiles. This step ensures there are no major issues/complaints with the new profiles. This step is reversible because you can drop the new column and revert to using the old profiles.

Login update No migration here just an app update to sign in using a public key instead of an address. This step is separate from the clean-up step because we can easily revert the sign-in update if it causes issues (we haven't dropped old addresses or profiles yet).

Clean-up

  1. Remove address from Addresses that have/use hex
  2. Drop unused Cosmos profiles This is the irreversible step that deletes all the old data once we confirm that there are no serious login issues or complaints.
mhagel commented 8 months ago

Generating hex values

  1. Create new column for hex on address table
  2. Add stored procedure which generates hex value for each new cosmos address
  3. Backfill hex values outside of migration as background job with batch processing (if this is indeed what is taking so long)

I can confirm that this step is not too bad using sequelize and javascript. About 12 seconds locally. So maybe no need for stored procedure or background job. I will do this part in an isolated transaction / migration.

mhagel commented 8 months ago

Profile merging

  1. Create new column for master profile on address table
  2. Consolidate all profile data for Cosmos users into a new master profile
  3. Populate the master profile column -> for cosmos address this is a new profile and for others it is the same as the old profile
  4. Update all app queries to use the new master profile column (can be aliased to avoid any naming refactors or create a view)

How about this? @timolegros

  1. Create new column for legacy_profile_id on address table
  2. Find the master profile by finding the profile with the most populated values
  3. Assign the "master" profile_id to Address.profile_id. Assign the previous value to legacy_profile_id for potential reversions

This way there is no need to update non-cosmos addresses, and no need to update app queries.

This is the slow step, so may need to do some batch processing here.

timolegros commented 8 months ago

Generating hex values

  1. Create new column for hex on address table
  2. Add stored procedure which generates hex value for each new cosmos address
  3. Backfill hex values outside of migration as background job with batch processing (if this is indeed what is taking so long)

I can confirm that this step is not too bad using sequelize and javascript. About 12 seconds locally. So maybe no need for stored procedure or background job. I will do this part in an isolated transaction / migration.

I think 12 seconds locally might still be too long - I would run your migration on Frick/Frack while running a load test with Nakuls Artillery load testing work to ensure migration can run effectively in production without issues. @kurtisassad is probably most familiar with load testing.

In any case, let's just ensure that we have a CHECK constraint on the Addresses table to ensure any Cosmos address is created with a hex value. This should come with the corresponding validation function in the Sequelize model. Note when creating the constraint (and after the hex value has been backfilled) that you should use the NOT VALID operator to ensure good performance.

timolegros commented 8 months ago

Profile merging

  1. Create new column for master profile on address table
  2. Consolidate all profile data for Cosmos users into a new master profile
  3. Populate the master profile column -> for cosmos address this is a new profile and for others it is the same as the old profile
  4. Update all app queries to use the new master profile column (can be aliased to avoid any naming refactors or create a view)

How about this? @timolegros

  1. Create new column for legacy_profile_id on address table
  2. Find the master profile by finding the profile with the most populated values
  3. Assign the "master" profile_id to Address.profile_id. Assign the previous value to legacy_profile_id for potential reversions

This way there is no need to update non-cosmos addresses, and no need to update app queries.

This is the slow step, so may need to do some batch processing here.

New column legacy_profile_id sounds good but I would still create a new profile for Cosmos addresses to consolidate all their other profiles rather than just picking the most populated profile.

mhagel commented 8 months ago

I would still create a new profile for Cosmos addresses to consolidate all their other profiles rather than just picking the most populated profile.

This is pretty complicated. I would need a SQL ninja's help.

@timolegros

mhagel commented 8 months ago

In any case, let's just ensure that we have a CHECK constraint on the Addresses table to ensure any Cosmos address is created with a hex value. This should come with the corresponding validation function in the Sequelize model. Note when creating the constraint (and after the hex value has been backfilled) that you should use the NOT VALID operator to ensure good performance.

done, thx.

mhagel commented 8 months ago

Backfill hex values outside of migration as background job with batch processing (if this is indeed what is taking so long)

I think 12 seconds locally might still be too long - I would run your migration on Frick/Frack while running a load test with Nakuls Artillery load testing work to ensure migration can run effectively in production without issues. @kurtisassad is probably most familiar with load testing.

We could theoretically just update on next sign-in. A cosmos address has a middle section that is common across chains. Possible steps:

  1. User signs in with osmo1abc123xyz11111
  2. If no Address.hex yet, immediately generate and populate hex for this address.
  3. Search DB for addresses that have middle section "abc123xyz" // ex: found regen1abc123xyz22222
  4. Verify same hex for found addresses
  5. If hex confirmed, populate hex columns
  6. perform Profile consolidation. Maybe use existing transferAddress logic to send emails to addresses to inform of change.

Doing 6 in javascript may be easier for me to figure out than SQL.

Pros:

Cons:

@timolegros

mhagel commented 8 months ago

Bumping this to an 8 due to ongoing strategizing and discovered complexity. @CowMuon

timolegros commented 8 months ago

Backfill hex values outside of migration as background job with batch processing (if this is indeed what is taking so long)

I think 12 seconds locally might still be too long - I would run your migration on Frick/Frack while running a load test with Nakuls Artillery load testing work to ensure migration can run effectively in production without issues. @kurtisassad is probably most familiar with load testing.

We could theoretically just update on next sign-in. A cosmos address has a middle section that is common across chains. Possible steps:

  1. User signs in with osmo1abc123xyz11111
  2. If no Address.hex yet, immediately generate and populate hex for this address.
  3. Search DB for addresses that have middle section "abc123xyz" // ex: found regen1abc123xyz22222
  4. Verify same hex for found addresses
  5. If hex confirmed, populate hex columns
  6. perform Profile consolidation. Maybe use existing transferAddress logic to send emails to addresses to inform of change.

Doing 6 in javascript may be easier for me to figure out than SQL.

Pros:

  • This would eliminate worries around migration locking in Prod. The server will perform the action just on the first sign-in.
  • Then (ideally) all the user's sign-ins from other chains will automatically come to the consolidated profile from then on.
  • Optional sign-in notification: "We just consolidated your cosmos addresses! Please review your profile details. This profile may have inherited some details from other profiles if you had others associated with this pubkey."

Cons:

  • this would leave many addresses in a "pre-consolidated" state.
  • still theoretical, requires more effort#5065

@timolegros

An additional con here is that we can't follow up with a clean-up migration/PR to remove old profiles/columns unless every Cosmos user signs in at least once. I think if you want to do JS profile consolidation either do it directly in the migration or as a background job like the DatabaseCleaner.ts.

mhagel commented 8 months ago

@zakhap and product folks - would love to hear feedback on how to handle the Profile consolidation.

The leading strategy is:

(a) Create a Master profile that brings together fields (profile_name, email, bio, etc) from the Signer's existing Profiles, favoring the latest updated

Second choice:

(b) Just use the profile with the most populated fields. (less effort here, but kinda arbitrary).

And there may be other possibilities.

My main concern is doing this in a way that leaves the fewest users upset with the changes. I think we may need to add some sort of notification that lets the user know that they may want to update their profile details, in case we made a choice they wouldn't have preferred.

kurtisassad commented 8 months ago

Sorry, was late to this ticket. Just saw the discussion now.

@mhagel We can run load testing via Nakul's PR while the migration script runs, like Tim suggests. Let me know if you would like to sync up on it sometime, and we can discuss and test it together.

mhagel commented 8 months ago

Thanks, @kurtisassad. I will let you know. We are talking about updating nearly 40000 rows. Right now I am leaning towards doing it in a async job instead of a migration, per suggestion by @timolegros. That is the best way I can think of do do a bulk update instead of 40k individual queries.

edit: actually found a bulkInsert path in the migration

mhagel commented 7 months ago

@zakhap According to @jnaviask, It sounds like we have handled this in the past by giving potentially affected users a heads-up before the migration (a modal on sign-in). This would be a prompt to update Profile info before the migration date. Does that sound like a good plan this time?

We could add all hex values (pubkey identifier) to cosmos addresses beforehand, identify users with more than one address-per-hex, notify only those users of the coming change, then do the Profile consolidation at a planned date.

Again, we are talking about approximately 1200 individuals who would potentially have their Profiles consolidated.

mhagel commented 7 months ago

Further development blocked on need for UX clarity. @zakhap @jnaviask @CowMuon

zakhap commented 7 months ago

We could add all hex values (pubkey identifier) to cosmos addresses beforehand, identify users with more than one address-per-hex, notify only those users of the coming change, then do the Profile consolidation at a planned date. Again, we are talking about approximately 1200 individuals who would potentially have their Profiles consolidated.

I think I am in alignment with this solution. After we identify who and how many users are effected, we can decide on how exactly we consolidate their profiles (which one becomes the user + profile).

zakhap commented 7 months ago

The leading strategy is:

(a) Create a Master profile that brings together fields (profile_name, email, bio, etc) from the Signer's existing Profiles, favoring the latest updated

Second choice:

(b) Just use the profile with the most populated fields. (less effort here, but kinda arbitrary).

I don't know what you mean by creating a new master profile? Why not just pick the Profile that was more recently or more actively used? (probably dependent on those discovered to be impacted most)

mhagel commented 7 months ago

I don't know what you mean by creating a new master profile?

The idea of the new master profile is to consolidate data from the more recently used Profiles. In most cases these fields would likely all come from the same Profile. The goal is to avoid data loss.

Why not just pick the Profile that was more recently or more actively used? (probably dependent on those discovered to be impacted most)

Just to reduce the possibility of data loss. A more active profile may not be the one with the most Profile data (bio, socials, etc). But, yes, there may be more insights after we find out how many active users (last 3-6 months) will be affected.

@zakhap

jnaviask commented 6 months ago

cc @zakhap

mhagel commented 6 months ago

This was addressed in #5429 . I initially hoped to split them into separate efforts, but it made more sense to implement at the same time as the hex population ticket, because it was important to populate new hex common keys on sign-in.