supertokens / supertokens-core

Open source alternative to Auth0 / Firebase Auth / AWS Cognito
https://supertokens.com
Other
12.25k stars 476 forks source link

Bulk User Import into SuperTokens #912

Open anku255 opened 6 months ago

anku255 commented 6 months ago

https://docs.google.com/document/d/1TUrcIPbdsHfqheIkB6CTkTNiA6YN-73Xz_pX2kjXhkc/edit

Open PRs:

TODO:

rishabhpoddar commented 5 months ago

About method 1:

About method 2:

main advantage of this approach lies in avoiding the addition of a CRON job

This is a very tiny advantage IMO. What other advantages are there?

Additional points:

anku255 commented 5 months ago

@rishabhpoddar After discussions, we've opted for method 1. I've expanded the original to provide detailed elaboration on method 1.

rishabhpoddar commented 5 months ago

Database Schema

Please give it in SQL format like CREATE TABLE ... and specify any keys.

Request Body of API to post json

The totp object needs skew, period, deviceName (optional). Please see the API to import a device.

Size Limit: 10,000 users

The test you ran, what are the specs of the setup?

All possible errors for the /add-users endpoint

thirdPartyId must be one of 'apple', 'discord', 'facebook', 'github', 'google', 'linkedin' or 'twitter'.

This is not true.

Get bulk import users

Delete bulk import users by importId

Why do we need this API? Why can't we just auto clear the space in the db?

A cron job, ProcessBulkImportUsers, runs every minute to handle users added to the bulk_import_users table. It processes users (status = 'NEW') in batches of 1000, updating their status to 'PROCESSING' before processing and to 'PROCESSED' afterward. If processing fails, the error_msg column is updated with error messages.

The query in getUsersToProcess function

All possible errors for the Cron Job

Pseudo code for the cron job

Utilizing the Bulk Import API

anku255 commented 5 months ago

@rishabhpoddar

Please give it in SQL format like CREATE TABLE ... and specify any keys.

Done.

The totp object needs skew, period, deviceName (optional). Please see the API to import a device.

Done.

The test you ran, what are the specs of the setup?

I ran the tests with my MacBook Pro which has pretty good specs but I was able to insert users till 50K as well. 10K is a conservative number that should be able to accommodate lower end specs too.

What if the user being imported already exists (either a user in ST, or as an entry in the bulk table)

We'll update the data, overwriting fields wherever applicable for simplicity. If a user is performing a bulk import while their previous auth provider is active, it's possible that user data could be updated for some users after creating the export file. Allowing users to update and create users in the same API would be helpful.

thirdPartyId must be one of 'apple', 'discord', 'facebook', 'github', 'google', 'linkedin' or 'twitter'. This is not true.

Can you please elaborate? Do we support every thirdparty?

Why do we need importid (seems like an overkill).

Upon some discussion we decided to remove it.

Does it work for mysql and postgresql? Is the FOR UPDATE necessary Yes it works in both (mysql v8 and above), postgresql (v12 and above).

The information on the atomicity of Common Table Expressions (WITH statement) is ambiguous. While some sources suggest potential issues where another query might select the same documents before they are updated, incorporating FOR UPDATE provides an additional layer of safety.

I don't think this is a good idea cause the entry being processed could have other information like a different set of roles, in which case, would you overwrite the roles for the older user or merge them?

I have wrote more about this in the above point.

How is the user supposed to know about these errors and handle them? They would have to modify / delete that specific user perhaps to fix this issue.. How do they do that?

Good point. The API will provide the index of the user requiring attention. Our script can then handle the removal of users with errors, creating a separate file for them. Users with errors can be addressed manually later.

What happens in the cron if an error is encountered for a user? Does it move on to the next user or stop there? Yeah, the cron will continue processing other users. The user having error will be marked as FAILED and an error message will be set.

Why do users need to use just in time migration along with bulk migration? Sure, there might be new user sign ups during the bulk migration, but users can just add those later on to ST anyway?

Not only there could be new sign ups but existing user data can be updated as well. Users can call the API again with the updated data and make it work without "just in time" migration as well. This requires that bulk import API will update the data if the user already exists.

Please mention what the implementation of importUser is. I need to see that in a flow diagram clearly, in detail.

Please find a pseudo code for the importUser function below. This is missing some things like userroles but it should be enough to give you an idea about how I am thinking about this.

function ImportUsers(user) {
    try {
      Start Transaction;

      let EPUserId = null;
      let TPUserId = null;
      let PWlessUserId = null;
      let primaryUserId = null;

      Set isPrimary to true for the first loginMethod if none of the methods have it set

      if (isPrimary is not true for any user.loginMethods) {
         user.loginMethods[0].isPrimary = true;
      }

      for (loginMethod in user.loginMethods) {
          if (loginMethod.recipeId = 'emailpassword') {
            EPUserId = CreateUserForEP(loginMethod, user.externalUserId);
            if (loginMethod.isPrimary) {
              primaryUserId = EPUserId; 
            }
          }

          if (loginMethod.recipeId = 'passwordless') {
            PWlessUserId = CreateUserForPasswordless(loginMethod, user.externalUserId);
            if (loginMethod.isPrimary) {
              primaryUserId = PWlessUserId; 
            }
          }

          if (loginMethod.recipeId = 'thirdparty') {
            TPUserId = CreateUserForThirdParty(loginMethod, user.externalUserId);
            if (loginMethod.isPrimary) {
              primaryUserId = TPUserId; 
            }
          }
      }

      if (user.loginMethods.length > 1) {
        Call accountlinking.createPrimaryUser for primaryUserId;
        Call accountlinking.linkAccount for every userId other than primaryUser;
      }

      if (user.usermetadata) {
        Update user metadata for the primaryUserId;
      }

      if (user.mfa) {
        Call mfa related APIs for the primaryUserId; 
      }

      Delete user from bulk_import_users table; 
    } catch(e) {
      user.status = 'FAILED';
      user.error_msg = e.message // Set meaningful error message
    } finally {
      End Transaction;
    }
}

function CreateUserForEP(data, externalUserId) {
  Validate data fields;
  Call EmailPassword.importUserWithPasswordHash;
  if (data.isPrimary) {
    Link this user to externalUserId;
  }

  if (data.time_joined) {
    Update time_joined of the created user;
  }

  if (data.isVerified) {
    Update email verification status;
  }

  return newUserId;
}

function CreateUserForPasswordless(data, externalUserId) {
  Validate data fields;
  Call Passwordless.createCode;
  Call Passwordless.consumeCode; // Email will be verified automatically here

  if (data.isPrimary) {
    Link this user to externalUserId;
  }

  if (data.time_joined) {
    Update time_joined of the created user;
  }
  return newUserId;
}

function CreateUserForThirdParty(data, externalUserId) {
  Validate data fields;
  Call ThirdParty.SignInAndUp API; // Pass data.isVerfied 

  if (data.isPrimary) {
    Link this user to externalUserId;
  }

  if (data.time_joined) {
    Update time_joined of the created user; 
  }

  return newUserId;
}
rishabhpoddar commented 5 months ago

We'll update the data, overwriting fields wherever applicable for simplicity. If a user is performing a bulk import while their previous auth provider is active, it's possible that user data could be updated for some users after creating the export file. Allowing users to update and create users in the same API would be helpful.

We decided that we will not be updating existing users in the bulk migration logic. If we see a repeated user in bulk migration (same external user id, or same account info in same tenant), then we will mark it as an error

anku255 commented 2 months ago

Problem

When a user generates a bulk import JSON file from an existing authentication provider, ongoing changes to user data can make the import outdated by the time it's completed. For example, if a user updates their password after the JSON file is created, they will not be able to log in with their new password on SuperTokens.

Solution 1: Bulk Import with User Data Updates

A straightforward solution is to perform a second bulk import after switching to SuperTokens. If the Bulk Import API supports updates, the new data can replace or add to what's in the SuperTokens database, helping to correct any inconsistencies from the initial import.

Issues with Solution 1

Solution 2: Implementing LazyMigration Recipe

Create a LazyMigration recipe to seamlessly migrate users to SuperTokens upon their first login or signup. Here's a potential implementation:

recipeList = [
  LazyMigration({
    getUser: (userId: string): BulkImportJSON => {
      // Fetch user data from the existing authentication provider
    },
  })
]

The process unfolds as follows:

  1. Transition to SuperTokens: Integrate SuperTokens into your application and configure the LazyMigration recipe. Start using SuperTokens for authentication.

  2. User Authentication: When an existing user logs in for the first time, the signUp recipe in Supertokens will retries their data from your existing authentication provider using the LazyRecipe recipe's getUser function. The user is then imported into SuperTokens. New users can sign up directly with SuperTokens.

  3. Bulk Import: Use Bulk Import to migrate all existing users to SuperTokens at once. Users previously migrated via the LazyMigration recipe will be automatically skipped during this process.

  4. Deactive existing auth provider: Upon completion of the bulk import, you can safely remove the existing authentication provider from your system.

anku255 commented 1 month ago

Bulk Import Documentation Outline

Migration steps overview

Overall goals:

Steps:


Code migration

You want to completely replace your existing auth system with SuperTokens

You just follow our docs as is.

You want to keep both alive for a certain amount of time

You just follow our docs as is, except for session verification, you do the following:

app.post("/verify", verifySession({sessionRequired: false}), async (req, res) => { let userId = undefined; if (req.session !== undefined) { // this means a supertokens session exists userId = req.session.getUserId(); } else { // try doing auth with existing auth provider userId = await yourExistingAuthVerify(req); }

if (userId === undefined) {
    // send a 401
} else {
    // Your API logic.
}

});

Frontend migration:


Option 1: Migration API -> Overview

For bulk migration

For small number of users

Talk about the migration API here and link to the JSON schema explanation in the "1) Creating the user JSON" page Also mention that if you call the API twice for the same user, it will not update the second time, and if users want to update some information, they will have to do so by calling individual APIs. For example, if you want to update the password hash of this user, call XYZ API, or if you want to update roles, call the roles PUT API (link to the CDI spec).


1) Creating the user JSON


2) Add the user JSON to the SuperTokens core


3) Monitoring the core cronjob


Example: Create user JSON file from Auth0


Option 2: User Creation without password hashes

For now, let it be how it is in the docs


Session migration (optional)

For now, let it be how it is in the docs


Step 4: Post production operations

If you have done bulk migration:

- Mention the problem that users may have updated their data whilst the migration is happening.
- This is an unsolved problem for now, but we will solve it by introducing a lazy migration recipe (coming soon).
- If you really don't want this problem, you should stop your existing auth provider before you start creating the user JSON. This will ofc add to downtime for your application.

If you have done User Creation without password hashes:

- You have to keep your existing auth provider running for the amount of time you think enough users have signed in to your application.
- The rest of the users will have to be imported through bulk migration with random passwords and then they will have to go through the password reset flow.