owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.36k stars 179 forks source link

NATS registry / cache / store #7272

Closed wkloucek closed 7 months ago

wkloucek commented 1 year ago

User Story

As a SaaS provider i need to support a scalable deployment for my registry / cache / stores to be able to perform under changing load.

Acceptance Criteria

Is your feature request related to a problem? Please describe.

As a user I want to have as less components as possible. I would love to use NATS as registry / cache / store. Currently I have to use different components.

Describe the solution you'd like

Have a performant NATS registry / cache / store implementation for the KV feature based on NATS Jetstream.

Have it loadtested, it should distribute load, have sufficient speed, be stable / highly available, delete unneeded data (retention).

We also should think about dropping offical support of other registries (etcd, consul, memory, mdns, kubernetes) and caches /stores (redis, redis-sentinel, noop, memory, ocmem) implementations since many of them are only usable in a limited deployment range and / or not battle tested. Currently official documentation lists them all, so I understand them as officially supported.

Describe alternatives you've considered

Additional context

Other known NATS topics:

wkloucek commented 1 year ago

Was discussed at the Hack-Week by:

dj4oC commented 10 months ago

@tbsbdr please schedule for next sprint since this is blocking further growth including the other NATS related issues mentioned by @wkloucek above

micbar commented 10 months ago

Already worked on, see status.

dj4oC commented 10 months ago

Already worked on, see status.

true. thanks for the spotlight. but there is more to do, right?

wkloucek commented 10 months ago

Is this really fulfilled?

We now have a nats-js registry. But what about the cache?

kobergj commented 10 months ago

@wkloucek isn't the cache already using nats-js store? (The nats-js store was already using the key-value store interface of jetstream. Only the registry implementation was not.)

wkloucek commented 10 months ago

@wkloucek isn't the cache already using nats-js store? (The nats-js store was already using the key-value store interface of jetstream. Only the registry implementation was not.)

My last info is that the cache does not work. See also https://github.com/owncloud/ocis/issues/7049

But there is also more than just a working cache / store / registry implementation when looking at all the linked tickets. We please need to clarify all operational questions. Can I use a memory backed stream? Who is responsible for creating streams? Who is responsible for configuring stream replicas. Are we clean when it comes to retention. Are we using the KV store / cache in a performant way?

wkloucek commented 10 months ago

Eg. the registry could also be a memory backed stream if that has advantages

kobergj commented 10 months ago

I see. I wasn't aware of https://github.com/owncloud/ocis/issues/7049 Seems like a standard panic. I'll take a look.

Regarding the other questions. I have no clue :) Should we have another meeting where we discuss where we stand and what needs to be done?

wkloucek commented 10 months ago

Regarding the other questions. I have no clue :) Should we have another meeting where we discuss where we stand and what needs to be done?

To be honest since https://github.com/owncloud/ocis/issues/7272#issuecomment-1715775681 nothing really changed. Those questions still need a answer (and modified code if needed). For that it might be helpful to read NATS (Jetstream) documentation. I already read parts of it and can be there as a sparring partner. But in general it makes sense to have a NATS "expert" in the oCIS development team since it's a really crucial part of oCIS.

kobergj commented 10 months ago

Not so much fan of the "expert" pattern. I would prefer everybody in the team to know about nats (jetstream) as it is the backbone of the system.

But still I am uncertain what still needs to be done and where the biggest pain points are. Your questions in https://github.com/owncloud/ocis/issues/7272#issuecomment-1821453735 more sound like a "how do we want to do it" then "how do we have to do it" questions.

I'm happy to drive natsjs improvements. I just don't know where to start.

wkloucek commented 10 months ago

Not so much fan of the "expert" pattern. I would prefer everybody in the team to know about nats (jetstream) as it is the backbone of the system.

Also fine for me. But probably one person needs to go ahead since we can't dedicate the full team to reading documentation for 2 days, right?

But still I am uncertain what still needs to be done and where the biggest pain points are. Your questions in #7272 (comment) more sound like a "how do we want to do it" then "how do we have to do it" questions.

I'm happy to drive natsjs improvements. I just don't know where to start.

A first questions would be eg. https://github.com/owncloud/ocis/issues/7119: Am I allowed to use memory streams? If so, how can I configure them? The ticket already talks about benefits of memory streams (see benchmark) but also about the problem when currently trying to use memory streams (immutable).

Next question: is the new registry implementation actually distributing load? The nats registry didn't do that from what I know (see https://github.com/owncloud/ocis/issues/7188)

kobergj commented 10 months ago

Oki.

wkloucek commented 10 months ago

I added another NATS topic which could really help for our SaaS: https://github.com/owncloud/ocis/issues/7801

wkloucek commented 9 months ago

Seems like the natsjs registry triggers some excessive logging on the NATS side: https://github.com/owncloud/ocis/issues/7948

micbar commented 8 months ago

@kobergj @wkloucek We need to check the status of the NATs implementation please.

fschade commented 7 months ago

@kobergj closable?

wkloucek commented 7 months ago

What we identified during that status meeting:

https://github.com/owncloud/ocis/issues/7231#issuecomment-1905861835

https://github.com/owncloud/ocis/issues/7245#issuecomment-1905855227

https://github.com/owncloud/ocis/issues/7023 -> not yet implemented but also not pressing

and one cache was still on file storage instead on memory storage :thinking:

kobergj commented 7 months ago

https://github.com/owncloud/ocis/issues/7231#issuecomment-1905861835

Will look into that today

https://github.com/owncloud/ocis/issues/7245#issuecomment-1905855227

This is just changing default values. Should we do that for the single binary too?

https://github.com/owncloud/ocis/issues/7023

This needs to be tackled with a followup ticket

and one cache was still on file storage instead on memory storage 🤔

No, not a cache. It was the registry. This is already fixed with https://github.com/owncloud/ocis/pull/8236

micbar commented 7 months ago

https://github.com/owncloud/ocis/issues/7245#issuecomment-1905855227

This is just changing default values. Should we do that for the single binary too?

Please do so, yes.

wkloucek commented 7 months ago

No, not a cache. It was the registry. This is already fixed with #8236

Thanks for keeping that information safe! I already forgot about it.

dj4oC commented 7 months ago

Please don't forget https://github.com/owncloud/enterprise/issues/6354

wkloucek commented 7 months ago

Discovered during another review:

kobergj commented 7 months ago

KV_cache-userinfo maxAge could be higher, but invalidation / extra validation need -> @kobergj will create a extra ticket

https://github.com/owncloud/ocis/issues/8297

kobergj commented 7 months ago

Guess we tackled all tickets here. I'll close this one for now.