Change Pow.Store.Backend.Base interface for performance

axelson commented 4 years ago

There shouldn't be any slow operations included in Pow.Store.Backend.Base, as it currently stands it is difficult or impossible to implement all operations performantly. One reason for this is that the interface requires execution/checking of a match spec. That works well if you're using ETS/Mnesia, but if you're using an external store such as postgres or redis, they do not support match specs at all, and converting a match spec into a performant query is difficult or impossible.

danschultzer commented 4 years ago

Definitely agree! Currently in Pow match spec is only used for namespace lookup, so it shouldn't be too difficult to convert to a performant lookup in external stores.

I've found nebulex super interesting, and want to dig into it to find a good solution. An idea that's interesting is if we can always use ETS/Mnesia in the front and have persistence/distribution handled with an adapter.

Initially I used a binary namespace, but switched over to match spec as I feel using core functionality of Erlang is better. It's much more efficient for the built-in ETS and Mnesia stores. But the question is how this can easily be translated to external stores.

axelson commented 4 years ago

I've found nebulex super interesting, and want to dig into it to find a good solution. An idea that's interesting is if we can always use ETS/Mnesia in the front and have persistence/distribution handled with an adapter.

That doesn't sound feasible to me since adding a layer of caching means that you now have to deal with cache invalidation. I can see it being nice if it is optional but I don't think it makes sense as the default when redis or postgres can be performant enough on its own.

Initially I used a binary namespace, but switched over to match spec as I feel using core functionality of Erlang is better. It's much more efficient for the built-in ETS and Mnesia stores. But the question is how this can easily be translated to external stores.

I think the way to handle this would be to specify a subset of possible queries, then each backend can create their own optimized queries, for postgres that would be a SELECT WHERE and for ETS/Mnesia that would be a match-spec (so just as performant as it is right now). But when it is an open-ended match spec that is not possible.

danschultzer commented 4 years ago

That doesn't sound feasible to me since adding a layer of caching means that you now have to deal with cache invalidation. I can see it being nice if it is optional but I don't think it makes sense as the default when redis or postgres can be performant enough on its own.

Agreed, I think there is a more elegant way of dealing with this.

I think the way to handle this would be to specify a subset of possible queries, then each backend can create their own optimized queries, for postgres that would be a SELECT WHERE and for ETS/Mnesia that would be a match-spec (so just as performant as it is right now). But when it is an open-ended match spec that is not possible.

Yup. Luckily the current match spec usage in Pow is all based around wildcard matching on last attribute with :_, so it should be relatively easy to set up.

I've modified the RedisCache in https://github.com/danschultzer/pow/pull/564 to use sorted sets for indexing. In tests it looks like it went from O(n) to near O(1), with inserts being expectedly a bit slower. To search the index I take in the match spec, split at the first wildcard match and then use the combination of the keys to find the appropriate index. The match spec is being ran against the returned results. I expire the members in the sorted list by using the ZREMRANGEBYSCORE technique.

With postgres it's simpler as you can use arrays. This is what https://github.com/ZennerIoT/pow_postgres_store does. I remember that you saw both sequental scans and N+1.

The N+1 is probably due to the key-value setup. As an example, with Pow.Store.CredentialsCache it'll first fetch the session (triggered by Pow.Plug.Session). If then the session is rolled, there's logic that'll kick in to delete all sessions with the same fingerprint. Another one might be that the PowPersistentSession will select from the same table. I could imagine this might trigger a N+1 warning.

As for the sequential scans, that might a missing index. I think GIN or GIST indexing is what's needed: https://www.compose.com/articles/take-a-dip-into-postgresql-arrays/

I would like to experiment with setting up type for the match spec queries.

pow-auth / pow

Change Pow.Store.Backend.Base interface for performance #562