Closed zedi-pramodh closed 2 months ago
I'm somewhat confused by this. A content tree is just a reference to content. When we send it to a node, we are saying, you want this (well, maybe connected to a volume). Doesn't it muddle the content resource if we add a node to it?
If we want to say, "here is a content tree, but don't download it", shouldn't we not send it to that node? Isn't that the job of the controller, to determine who will get the message, "download this tree"?
Conversely, if we really have to send it to multiple nodes - it would help to understand why the controller would send it to nodes A, B, C if only B will download it - then I think a separate structure would be the right place. "This is a content tree" should look identical everywhere. "This is a list of nodes that should have content tree X", where X is the reference to the aforementioned content tree, is distinct.
This is very similar to Volumes struct which already has designated node id. Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C. Having this designated node id tells zedmanager and zedagent to not worry of content tree config/status as long as Volume status can be fetched to start an app (trigger domainmgr). If we don't do this way, then we need to come up with a more complex mechanism of cluster wide pubsub or make big changes in eve micro services. This approach simplifies the code on remote nodes. Now why cloud has to send config to all nodes ? Its mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.
I believe Andrew has multiple docs, we need to formalize into one doc and publish as docs/EDGE-NODE-CLUSTERS.md. We will work on that.
This is very similar to Volumes struct which already has designated node id.
My understanding (correct me) is that a volume is a specific thing on a specific node. It lives only in one place, like a VM or a container, while a content tree is like a VM or OCI image, it is referenced in many places, is identical, can be instantiated in many places into a VM or container; once it does, then its local instantiation (i.e. volume) is unique. If you like, class vs object (I never really loved OOP, but the ideas are useful).
Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C.
I know you are right about this behaviour. The question is how to configure it. Sending a node a content tree in its config does not mean, "download this content tree", it means, "make sure this content tree exists on your node, I do not particularly care how." Until now, there was only one way, i.e. download it, so volumemgr (I think?) implicitly converted "here is a content tree you must have" to "here is a content tree to download". You are saying, there now are multiple ways of satisfying the requirement, "have this content tree available". If the implicit assumption of "here is a tree = download it" no longer holds, volumemgr should understand other ways of getting volumes.
In the end: zed receives a config with content tree and volumes, passes them to volumemgr; when volumemgr is done, those trees and volumes exist. How it does so is transparent to everyone except volumemgr. Any changes to zed or outside services makes the whole structure more complex, and creates a lot of possibilities for breaking things.
Now why cloud has to send config to all nodes ? Its mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.
Controller always sends config to all nodes, otherwise a node cannot function. And it always should tell it, "here are all the things you need to know to function." Did I misunderstand the point you are making?
This is very similar to Volumes struct which already has designated node id.
My understanding (correct me) is that a volume is a specific thing on a specific node. It lives only in one place, like a VM or a container, while a content tree is like a VM or OCI image, it is referenced in many places, is identical, can be instantiated in many places into a VM or container; once it does, then its local instantiation (i.e. volume) is unique. If you like, class vs object (I never really loved OOP, but the ideas are useful).
Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C.
I know you are right about this behaviour. The question is how to configure it. Sending a node a content tree in its config does not mean, "download this content tree", it means, "make sure this content tree exists on your node, I do not particularly care how." Until now, there was only one way, i.e. download it, so volumemgr (I think?) implicitly converted "here is a content tree you must have" to "here is a content tree to download". You are saying, there now are multiple ways of satisfying the requirement, "have this content tree available". If the implicit assumption of "here is a tree = download it" no longer holds, volumemgr should understand other ways of getting volumes.
In the end: zed receives a config with content tree and volumes, passes them to volumemgr; when volumemgr is done, those trees and volumes exist. How it does so is transparent to everyone except volumemgr. Any changes to zed or outside services makes the whole structure more complex, and creates a lot of possibilities for breaking things.
The problem is that on the remote nodes we do not need to download the content at all since the content is already converted to volumes and shows up as a PVC. For a native container OCI image we will download the content on remote nodes too. So we need a way to tell if the content downloads needs to be ignored on that node.
Now why cloud has to send config to all nodes ? It's mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.
Controller always sends config to all nodes, otherwise a node cannot function. And it always should tell it, "here are all the things you need to know to function." Did I misunderstand the point you are making?
What I am trying to say is controller will send app config to all nodes in a cluster even though that app is not supposed to run on that node. So we need a way to make sure that particular node is not downloading content unnecessarily.
The problem is that on the remote nodes we do not need to download the content at all since the content is already converted to volumes and shows up as a PVC. For a native container OCI image we will download the content on remote nodes too. So we need a way to tell if the content downloads needs to be ignored on that node
@zedi-pramodh totally agree about the behaviour. I am discussing where we flag, "this content tree can be made available via method A (download) on this node, and method B (volume replication) or C on that node." My point is, content tree is an immutable thing, and is identical. Inclusion of a content tree by config means, "be sure this content tree is available". It does not mean, "download this content tree". So if we want to tell a node how to get that tree, that is something other than a property of the tree (probably property of the Volume).
What I am trying to say is controller will send app config to all nodes in a cluster even though that app is not supposed to run on that node. So we need a way to make sure that particular node is not downloading content unnecessarily
Ah, ok, that makes sense. Brings us back to the previous point.
This is sort of like saying, I will want to run a container made from image alpine:3.20
on 3 nodes. Node A should download it then sync to B and C. I won't modify the tag or manifest of alpine:3.20
to tell A to download and B,C to sync; I will have the same reference to alpine:3.20
on all 3 nodes, and have a separate instruction set that says, "here is how to get it", which in our case is (I think) Volumes, which already are node-specific.