kibae / typeorm-sharding-repository

TypeORM Sharding Repository: Enables TypeORM to be utilized in a distributed database environment.
https://www.npmjs.com/package/typeorm-sharding-repository
MIT License
11 stars 3 forks source link

directory based sharding #2

Open rungwe opened 2 months ago

rungwe commented 2 months ago

Hi there, how best can I implement directory based sharding, whereby the sharding key is a string or enum that represents a category or in a multi-tenant environment, the key could be the name of an organisation. There seems to only numerical range based sharding.

kibae commented 2 months ago

Hi, @rungwe :) I tried adding functional sharding rules for various sharding keys. However, by allowing any value within an entity to be used in the sharding rules, I ended up making methods like findOneById unusable.

3

https://github.com/kibae/typeorm-sharding-repository/pull/3/files#diff-10165311b50fce469f6f3fcf946526f533abea482c8d56026b8986778cede872

Also, I discovered that when the key of an entity is updated, it necessitates moving the data to another shard. I haven't implemented this feature, and it might be considered a flaw.

Sharding requires you to determine which shard the data is in, so you need to use the key as the sharding key. Are you interested in using a string type of key? I wonder if I overengineered this.

Anyway, please take a look and let me know. Feel free to send a PR(feature/function-sharding). Thank you.

rungwe commented 2 months ago

Hi @kibae

Thanks for the prompt response. Indeed it is an interesting problem I have checked out the repo to play around with it to figure out the extend of the challenge.

In my rough implementation that I tried to do, I ended up having to modify the findOneById and findOneByIds by having an extra parameter for the sharding key in an attempt to make it usable. On the bright side though, it seems these 2 methods have been deprecated since a while ago in the typeorm library itself. From that point of view I guess we are giving these methods too much attention which is adding complexity?

findByIds(ids: any[], shardingKey?: string): Promise<Entity[]>;

Another idea, which will make these 2 fully compliant to the original TypeOrm BaseEntity is perhaps a full scan from all shards, there are of course performance implications. However, I have noticed that other findBy methods are already doing a full scan, so perhaps indeed we could be over engineering findOneById and findOneByIds.

On the change of key aspect, I wouldn’t worry about it, sometimes simplicity is better, primary key rarely changes. The library shouldn’t promote such bad practices.

I have also noticed that we default to the use of the last shard if we cannot resolve to the last shard, I think as part of the shard configurations I would propose having a default field, to allow users to be explicit about that behaviour.

Let me know your thoughts on the above?

On another note, I am using NestJs, I was thinking of forking this library and convert it into a NestJs module, I don’t know if you had such intentions? Otherwise, I will assume you would like to keep it pure?

kibae commented 2 months ago

Conversation continued from #4