thoughtbot / carnival

An unobtrusive, developer-friendly way to add comments
MIT License
499 stars 30 forks source link

Add unique constraint on UserEmail #227

Closed pbrisbin closed 9 years ago

pbrisbin commented 9 years ago
pbrisbin commented 9 years ago

We actually have a few real users with duplicate emails already because of the Learn->GitHub transition. I think this deploy should have a migration to:

One question is what happens to users who have commented after logging in via Learn, but not yet logged in via GitHub? When they attempt to log in next (assuming the same email), it'll fail with nothing they can do to correct the situation since we no longer have Learn as an option. I'm wondering if bringing back Learn (Upcase now) as an option is the only thing we can do.

cpytel commented 9 years ago

How many users would that be?

I feel like that's not a great option long term for what is likely a small number of people. We could even personally contact these people and help them out.

On Apr 10, 2015, at 6:14 AM, pat brisbin notifications@github.com wrote:

We actually have a few real users with duplicate emails already because of the Learn->GitHub transition. I think this deploy should have a migration to:

Find such users Reassign comments from the learn user records to the github ones Delete the duplicate learn users One question is what happens to users who have commented after logging in via Learn, but not yet logged in via GitHub? When they attempt to log in next (assuming the same email), it'll fail with nothing they can do to correct the situation since we no longer have Learn as an option. I'm wondering if bringing back Learn (Upcase now) as an option is the only thing we can do.

— Reply to this email directly or view it on GitHub.

pbrisbin commented 9 years ago
select count(*) from "user" where plugin = 'learn' and email not like '%thoughtbot.com';
 count 
-------
   284
(1 row)

With a little psql-foo, I should be able to see how many of these users have actually left comments. Those that haven't could just be deleted. I could also just delete without checking and rely on the foreign-key constraint to do the filtering.

cpytel commented 9 years ago

I think it makes sense to delete anyone who hasn't commented.

On Apr 10, 2015, at 6:30 AM, pat brisbin notifications@github.com wrote:

select count(*) from "user" where plugin = 'learn' and email not like '%thoughtbot.com';

count

284 (1 row) With a little psql-foo, I should be able to see how many of these users have actually left comments. Those that haven't could just be deleted. I could also just delete without checking and rely on the foreign-key constraint to do the filtering.

— Reply to this email directly or view it on GitHub.

pbrisbin commented 9 years ago

So the following are the current production numbers:

The first two groups are easy: delete one, migrate the other.

I see two options for dealing with the third group:

  1. Send a mass email inviting them to re-login

We'd wait a reasonable amount of time, then migrate or delete them (and potentially their comments). The downside here is we're blocked on enforcing uniqueness until this process is done.

  1. Add some code to live-migrate if/when we see emails that duplicate a learn account

This would take some juggling to create the new user with a temporary, unique email, migrate, then update back to the actual email after the learn user's been deleted. We could still send the mass email to try and speed up the process, but we wouldn't be blocked from adding the constraints. Once all the learn accounts are gone, we could remove this bit of code.

jferris commented 9 years ago

I think it would generally be good to have a "merge on the same email" workflow for cases where somebody signs in with GitHub and then signs in with Google the next time.

pbrisbin commented 9 years ago

That actually makes sense. We just keep a single user record, whatever it is, and if someone comes in with a different plugin/ident but the same email we just update the plugin/ident on the existing record.

All we'd have to deal with in the existing data are the current email dupes, which we could migrate.

jferris commented 9 years ago

The code here makes sense. So, the plan would be:

Does that sound right?

pbrisbin commented 9 years ago

Yup, I'll move forward with those steps, but not deploy anything. We need to resolve the duplicates currently in production before we can do that since adding the DB constraint will fail.