supertokens / supertokens-node

Node SDK for SuperTokens core
https://supertokens.com
Other
294 stars 79 forks source link

(Experienced) Optimise for serverless execution #6

Closed rishabhpoddar closed 3 years ago

rishabhpoddar commented 4 years ago

Have a boolean that is set in case the execution env is serverless. If serverless, we save handshake and apiVersion information in /tmp folder as a cache. FIle system reads are much faster than network calls.

rishabhpoddar commented 4 years ago

See https://www.netlify.com/blog/2018/09/13/how-to-run-express.js-apps-with-netlify-functions/ for more info

kant01ne commented 4 years ago

Reading from our documentation:

All config values: hosts: string - ; separated string for all the locations of SuperTokens instances. (...) apiKey - Specify any one of the API keys that you have set in the config.yaml

There are more config values in the config.yaml file (for on premise) or on the SaaS dashboard. The values you set via the init function above will override those.

An alternative I'm thinking about is to specify a variable saying that the handshake is useless because all the values are already present. That way there is no handshake at all. That would be useful for server-less applications but maybe even for other application which already have environment variables for pretty much everything in their application? Any reason why this handshake would be necessary if all variables are already provided?

Let me know if you think this is an idea that is worth exploring.

From the mentioned article:

There is also 512 MB of temporary disk space available across all memory configurations. It’s possible to stash files or data into this /temp directory of the function but because the functions are ephemeral in nature, you can’t rely on the cache always being present.

So basically what that means is that the /tmp folder will be accessible only while the Lambda function is still warm. Whenever the load increases, new functions will be spin up and a new handshake will be required. Using the caching would still be a reasonable improvement that is worth implementing.

Serverless providers:

Other providers benchmarked which do not support server-less:

Implementation details:

Different systems will have different paths to store in the functions execution context file system. The way I see this feature being added to the Node SDK is that we'd first add an option in the SuperTokens.init() function:

app.use(supertokens.init({
    hosts: "http://localhost:3567;https://try.supertokens.io",
    apiKey: "key",
    serverless: 'netlify' // ('azure'|'aws'|'google') | default to false
}));

In session.ts:41 we can pass config.serverless to HandshakeInfo.getInstance({serverless: config.serverless}).

And then in Handshake getInstance method: 1/ Check if handshake information is present. 1.1/ Yes, return. 1.2/ No, Continue. 2/ Check if serverless option is present. 2.1/ No, query the core. 2.2/ Yes, continue. 3/ Check if Handshake Info is present in the file system. 3.1/ No, query the core, store information in the file system for future use. 3.2/ Yes, return.

HandshakeInfo.getInstance() is called in bunch of other places without the config provided but session.init will always be called first, and will always initiate the HandshakeInfo appropriately.

Testing: I'm thinking about bootstrapping applications for each providers which will take a lot of time just to make sure that it is working but in the end, writing some tests mocking the calls to write in file system to store in a global variable should do the trick.

Additional changes required:

Notes:

rishabhpoddar commented 4 years ago

Any reason why this handshake would be necessary if all variables are already provided?

The handshake contains information that are not a part of the config values provided to the driver. For example, accessTokenBlacklisting is one such parameter. We can add the missing ones to the config params, however, I prefer to have all configs in one place. We can move most of the params to the SDK and away from the core, but we have to be careful that those will never be required by the core outside the context of an API call (since then the core will not know its value - unless there is a way?).

Other providers benchmarked which do not support server-less:

I didn't quite get that section. Does this mean that DigitalOcean and Linode and not relevant for this issue?

serverless: 'netlify' // ('azure'|'aws'|'google') | default to false

Is there a way to automatically know which serverless env they are using? This way, they can just pass in true. One idea is the check if /tmp exists or if D:\LOCAL\TEMP exists.

In session.ts:41 we can pass config.serverless to HandshakeInfo.getInstance({serverless: config.serverless}).

I prefer if we keep HandshakeInfo.getInstance() the way it is, because it's being used in a couple of places right now. Instead, maybe have a different way to set this value in the HandshakeInfo class (as a static variable would also do, and false by default). The getInstance function can then refer to that variable.

And then in Handshake getInstance method

This seems fine. Any scope for race conditions?

I'm thinking about bootstrapping applications for each providers which will take a lot of time just to make sure that it is working but in the end, writing some tests mocking the calls to write in file system to store in a global variable should do the trick.

Sounds good. But we must check that it actually works in the providers at least once.

Documentation update here

Agreed with the docs change. But I think we need to add a separate page in the docs as well specifically "For serverless execution" and explain this.

I understand the node SDK is the most popular one used by the community, but this change would essentially need to be propagated to all other backend SDKs.

Yes.

rishabhpoddar commented 4 years ago

@NkxxkN / @bhumilsarvaiya , please make the above changes.. You both can decide who wants to do this. Thanks

kant01ne commented 4 years ago

We can move most of the params to the SDK and away from the core, but we have to be careful that those will never be required by the core outside the context of an API call (since then the core will not know its value - unless there is a way?).

Configs should live in the core I agree. The point here was to override all of them to avoid Handshake for this very specific edge case of serverless.

I didn't quite get that section. Does this mean that DigitalOcean and Linode and not relevant for this issue?

Exactly. The goal of that section was to make sure we cover all cloud providers that provide server-less solutions and not only AWS. I wanted to make it clear that I benchmarked those ones and couldn't find server-less solutions. It could have been more explicit though I agree xD

Is there a way to automatically know which serverless env they are using? This way, they can just pass in true. One idea is the check if /tmp exists or if D:\LOCAL\TEMP exists

That makes sense. Having a way to know that through system variable would be the best. Let's try to avoid unnecessary file system calls to /tmp if we can know beforehand that the system runs on Azure functions.

I prefer if we keep HandshakeInfo.getInstance() the way it is(...)

Agreed.

This seems fine. Any scope for race conditions?

Race conditions would not lead to failure, worst case scenario, the handshake is done multiple times and the file gets overridden. I feel it is acceptable as opposed to make the logic more complex.

Sounds good. But we must check that it actually works in the providers at least once.

Agreed.

@NkxxkN / @bhumilsarvaiya , please make the above changes.. You both can decide who wants to do this. Thanks

I can take that one. I made some researches already. @bhumilsarvaiya I'll ask you for review once it's done :)

rishabhpoddar commented 4 years ago

Configs should live in the core I agree. The point here was to override all of them to avoid Handshake for this very specific edge case of serverless.

However, when we do this, we provide more overhead on the user's part - They may wonder... why do I have configs in the config file and in the SDK? What happens if the values are different?... This also adds complexity to the docs.

This is already happening today since we have params like access_token_path in the SDK as well as the config.yaml. The SDK one overwrites the config.yaml one. But I think we should remove it from config.yaml because this variable (and other similar ones) are not used in the core at all. There is an issue for this as well: https://github.com/supertokens/supertokens-core/issues/11

That makes sense. Having a way to know that through system variable would be the best. Let's try to avoid unnecessary file system calls to /tmp if we can know beforehand that the system runs on Azure functions.

True. But if there is no such env variable, then would we ask users to give ('netlify'|'azure'|'aws'|'google') through the config? Ideally, can we think of a way such that the user doesn't even have to put in isServerless param in the init call at all? My priority is:

The reason for the ordering above is to simply minimise the overhead on the user.

Race conditions would not lead to failure

Okay. But it would be good to know what they are :)

I can take that one.

Cool! However, this is not a priority issue.

rishabhpoddar commented 3 years ago

@NkxxkN Can you please implement us in Vercel and see if it works (without the above optimisation)

bhumilsarvaiya commented 3 years ago

@rishabhpoddar We'll have a root level config isInServerlessEnv (boolean). If user sets this to true, we store the handshake and apiVersion information in the /tmp folder. If user doesn't pass this config, it will be set to false.

rishabhpoddar commented 3 years ago

@bhumilsarvaiya please also make docs changes for this in: