Closed gnawux closed 2 years ago
@gnawux It would be great if you can also describe your migration path/story from 2.0 to 3.0 given that you were also using 2.0 in production I assume? If there were any migration tools that helped, also how did this affect your SLOs (did those change?), finally, how does it further help/improve Observability/Debuggability.
@gnawux, just in order to have things as clear as possible, I'd like to ask whether you could split this into 2 issues:
The first one is, AFAIU, what you're proposing for Berlim's summit.
The "optional built-in sandboxing feature in the shim" is the "shim+VMM all-in-one" binary, right? So, that can be a discussion we could and should treat in a different issue, as that's also quite huge and equally important.
Would that be okay?
@zanetworker will check with team and summary them.
@fidencio This is the kick-off issue of the whole thing and we will raise detailed issues and PRs. And In my plan, we could have the both parts in Berlin if everything went correct. However, if there were anything went wrong, we might adjust the roadmap :)
@fidencio This is the kick-off issue of the whole thing and we will raise detailed issues and PRs. And In my plan, we could have the both parts in Berlin if everything went correct. However, if there were anything went wrong, we might adjust the roadmap :)
Okay, but I really would like to ask to have the issues raised way before the PRs, so we have enough time to discuss the issues (on whatever will need to be discussed). I am assuming then that you'll use this issue as a tracker issue for the others.
@gnawux Here are a few ideas and comments
Thanks for attempting such major work.
@c3d I am @gnawux's colleague working on rust shim. Hope I can explain more details for your comments
@gnawux Here are a few ideas and comments
- The description seems to indicate that the shim reimplementation is well advanced already, and that the remaining work is sandboxing. Is that correct?
No. We have implemented rust shim with build-in sandboxing internally, which has been used in production envorinment for quite a long time. I think our internal codes could be considered mature, but some refactor is needed to support a pluggable architecture to support other vmms
- Assuming this is correct, was there any external repository for the development of that shim code so far, or was this all developed internally? If it was developed internally, was it in its own repository, or are there repository dependencies to Kata?
So far all developed internally. We followed the logic of the existing Kata but totally rewirted in rust. The agent part inherits most of Kata Agent codes
- When you talk about "passing the same test cases", do you mean you can run the existing Kata CI, or some other series of tests?
Mostly based on our internal CI and fully tested in production environment
- If there is a large body of code already, how do you suggest we should review it? It makes sense that the community would need to take time to review it. As @fidencio remarked during the meeting, all developers don't have equivalent familiarity with Rust and Go
I think we should gradually release out the rust code piece by piece, with detailed documents and carful reviews. The open source process is also a good time for us to refactor and clean up. We do need and welcome help from the community
Thanks for attempting such major work.
Thanks @zmlcc for the explanation.
@c3d I agree a patch containing massive codes is nightmare for reviewers and I can still recall that it took us over 3 months to landing kata-shim-v2 code last time. What I suggest are:
For language, I personally found that almost all the system devs are interesting in Rust 😸. In the initial Rust-shim developing days, d@lifupan @bergwolf and others are working on the Go code as well, and we commit to continue working on the Go-shim maintaining as long as they are still been supported by kata community.
First, I agree with Fabiano that we need to separate the shim-v2 feature from the sandboxing feature and I'd recommend to start with the shim-v2 only feature. This could help Ali avoid any dependencies they have on the sandboxing feature parts when they start pushing the code to review.
Secondly, regarding the actual shim-v2 code. I believe that Samuel and and Peng-Tao proposed at the time creating a separate repository to push in the shim-v2 code and use it to review the changes which makes sense to me (as opposed to creating a new branch). I think that in practice it will be hard to break the shim-v2 code into smaller PRs that actually compile and run so we could consider the following approach:
My 2 cents.
@ariel-adam It would be very nice to create a new empty repository to hold rust code. Actually, we are trying to refactor our internal code into small pieces for ease review and clean. I think we can push these pieces to the new repository by a series of PRs in the following steps:
The above code release process would take several months. I think the community will have enough time to familiarize and review. If all goes well, then we will have the base repository that implement a generic shim in rust and optional built-in sandboxing. On this basis advanced development can be carried out, such as CI, more hypervisors support and richer features as the existing go version, etc.
I agree the switching from go to rust shim would take years and need lots of work to make sure nothing breaks and the new shim performs better. We will have to maintain two versions together for quite a long time, but we can gradually shift our major focus on the rust shim when it becomes mature
By the way, If we decide to switch to from go to rust in the furture, shall we directly use the new rust repository as the main repo? Or merge the rust repository back to current main repo?
@zmlcc By reading the above carefully, I think this is a significant and meaningful thing, and I am very grateful for Alibaba's action here. However, I would like to know if we have specific data to compare the resource consumption and performance of the rust and go versions of shim based on Alibaba's internal implementation, which might be more intuitive to illustrate the essential of the rust version.
So based on the above discussions, the immediate action for us is to create a separate repository to hold the new shim code so that it can be properly reviewed/refactored to meet the community requirement. Can we agree on this? @kata-containers/architecture-committee @ariel-adam @zmlcc @gnawux
We should also decide on a roadmap for the feature to get mature and be merged into the kata-containers repository, which can be based on @ariel-adam's comment and include the following things:
The plan makes sense to me :-).
I think that since we are talking about a potential disruptive change then by creating a separate repository those developing the rust shim will be able to provide visibility to all the community without breaking anything immediate (and still avoiding the kata1.x/kata2.x branch issues such as duplicated CI we had in the past).
There is also the topic of getting developers more familiar with rust which will take time and this approach I believe gives our community that time.
So based on the above discussions, the immediate action for us is to create a separate repository to hold the new shim code so that it can be properly reviewed/refactored to meet the community requirement. Can we agree on this?
Not yet, Tao. Let's think about the work it'll cause, and then we can have an agreement after that.
If we do this in a different repo, what does it mean in the future?
If we do 2, that will be a pain. We still have users going to wrong repos, regardless the fact they're marked as EOL. If we do 1. we will have to spend a reasonable amount of time setting up CIs for the new repo to ensure it has exactly the same functionality as the current one, which would be easier to do when dealing with a branch.
While on this, I sincerely fail to understand this comment here, @ariel-adam.
and still avoiding the kata1.x/kata2.x branch issues such as duplicated CI we had in the past
What happened during 1.x -> 2.x transition era was different repos, which is exactly what was suggested here. With a new branch approach it'd be easier to actually avoid CI duplication (with tweaks in the install / build scripts), and also to keep the visibility of the work coming in.
Still ... the most important thing here that I don't see being discussed is a maintenance plan for the go-shim, who will step up to maintain that, and for how long. I really want to see a list of the names / companies that will, for sure, help with that, and what are the responsibilities each company will own from now on.
Another thing I'd like to raise while having the discussion on what and how things will be merged, is having a solid plan on who will review this code. I really would like to suggest that each PR has at least one reviewer from Apple, IBM, Intel, and Red Hat. While this may slow down the process, this will ensure that anything going on will not cause a breakage for the players we have here. I'd go further and ask other companies to chime in and add reviewers to ensure their use case is and will keep working, that the work they're doing now will be reflected on the rust shim, and so on, and so forth Again, we need a serious commitment to do so, but I think it's doable, but we need a serious commitment with the companies involved with the project.
So, just to sum up everything I wrote, I'd like to suggest:
Thanks, @fidencio! All your concerns are valid to me. As a starter, we at Ant Group are willing to sign up for the go-shim maintenance and rust-shim reviewers groups. And I'd recommend all key kata players to sign up too.
As for a new branch vs. a new repo, there were concerns about disrupting the kata-containers repository. But I agree with you that a new branch really can't break anything in the main
branch. And given the fact that we'd be able to set up CI more easily, I agree with you that a new branch works better than a new repo. We are doing the same for the CCv0
branch and it works pretty well IMHO.
@fidencio The summary is really appreciated.
@fidencio @gnawux After discussing with @ariel-adam, I still think that there would be value in having a separate repo for review purpose, but not a blank one. Instead, that could be a clone of the kata-containers
repo (for example, it could be hosted on https://github.com/alibaba/kata-containers, forked from https://github.com/kata-containers/kata-containers.
The benefits I see to this approach are:
main
on a regular basis would be straightforward (minimal conflicts expected since this would essentially be new directories and new files)Once we are happy with the state of the "Alibaba repo", we call it a release and merge it into the main kata-containers repo.
The alternative is to use a development branch on the main repository, like CCv0
. That works as well, but may cause more "noise" initially for reviews. I already find it to be the case for CCv0
to some extent, despite our best efforts to identify that work clearly.
@c3d, I don't fully understand your points, let me go through each one of those and raise questions.
- We would have one place to review Alibaba's code (not N places for individual Alibaba developers)
What do you mean by "not N places for individual Alibaba developer"? And how would this be different than using a branch?
- It would make it clear that this is still intended to be merged back into kata-containers/kata-containers ultimately
How would this be different than with a branch?
- It would make the proposed repository internal structure quite clear (e.g. where they would add the Rust shim code)
I don't follow. When a PR is open you can easily see the internal structure, and this shouldn't be different whether one uses a repo or a branch.
- Alibaba could leverage the existing tools, e.g. CI, without any disruption for the kata-containers team
So, this is an important point. Are you suggesting that CI should be entirely worked by Ant / Ali / Hyper folks? What's the difference on doing this with the branch and with the repo, according to your understanding?
- Because of the shared history, rebasing on main on a regular basis would be straightforward (minimal conflicts expected since this would essentially be new directories and new files)
I don't see why it'd be different than what we'd get when using a branch, could you elaborate more on this?
- Anything that is immediately useful to Kata 2.x (say, adding some capabilities to the CI, or Rust agent fixes) could be submitted to kata-containers along the way
Again, I don't see why it'd be different than what we'd get when using a branch, could you elaborate more on this?
- Alibaba would make it clear that it's their contribution (they deserve it).
I really don't follow you, sorry. How using a branch would make it any different?
The alternative is to use a development branch on the main repository, like CCv0. That works as well, but may cause more "noise" initially for reviews. I already find it to be the case for CCv0 to some extent, despite our best efforts to identify that work clearly.
The "noise" is work being generated that we all should be committed to review. Having the "noise" generated in a different place, IMHO, may actually foggy the visibility of this effort.
All in all, I am really sorry, but the points you raised are not clear to me, at all, and going to this direction would cause an extra load at least on the CI side, as mentioned on https://github.com/kata-containers/kata-containers/issues/3052#issuecomment-978953449
@c3d
The alternative is to use a development branch on the main repository, like CCv0. That works as well, but may cause more "noise" initially for reviews. I already find it to be the case for CCv0 to some extent, despite our best efforts to identify that work clearly.
To add up to @fidencio's comments, we actually want to make such "noises", to get community attention and to get people familiar with the code Since it is a big feature for kata 3.0, we don't intend to hide it in any way. So I'm sorry if it bothers you, but it is the plan, not something we want to avoid.
- We would have one place to review Alibaba's code (not N places for individual Alibaba developers)
What do you mean by "not N places for individual Alibaba developer"? And how would this be different than using a branch?
It would not make much of a difference for one branch. But since there are multiple developments going on, (Rust shim, sandboxing, rune, etc), I expect multiple topic branches to naturally emerge, not just one. So I'm looking for a way to group many related branches. The alternative would be a naming convention for branches, i.e. we could have v3/agent
, v3/runtime
, v3/shim
and so on.
- It would make it clear that this is still intended to be merged back into kata-containers/kata-containers ultimately
How would this be different than with a branch?
That point was primarily in contrast to a blank repo, not to a branch.
- It would make the proposed repository internal structure quite clear (e.g. where they would add the Rust shim code)
I don't follow. When a PR is open you can easily see the internal structure, and this shouldn't be different whether one uses a repo or a branch.
Again, it would if there were multiple topic branches converging. In one case, all the related discussion would be intermixed with, say, the kata 2.5 release. In the other case, the split would be easier, helping people focusing on one task or the other.
- Alibaba could leverage the existing tools, e.g. CI, without any disruption for the kata-containers team
So, this is an important point. Are you suggesting that CI should be entirely worked by Ant / Ali / Hyper folks? What's the difference on doing this with the branch and with the repo, according to your understanding?
For example, Alibaba could decide that in their repo, the set of green CI is a bit lower, at least initially, to reduce resource usage. This is just one example.
- Because of the shared history, rebasing on main on a regular basis would be straightforward (minimal conflicts expected since this would essentially be new directories and new files)
I don't see why it'd be different than what we'd get when using a branch, could you elaborate more on this?
The point here is that it's not made harder by using a different repo. But there are also useful use cases. For example, say we introduce a change in v2 that breaks all the v3 branches. They only need one adjustment patch on "their" main
that engineers working on v3 can rebase all the rest on.
- Anything that is immediately useful to Kata 2.x (say, adding some capabilities to the CI, or Rust agent fixes) could be submitted to kata-containers along the way
Again, I don't see why it'd be different than what we'd get when using a branch, could you elaborate more on this?
It's really a matter of one branch versus many. I do not believe that Alibaba's effort would reasonably fit in a single branch. So I'm trying to find an organization letting us more easily create a "funnel" of related branches that ultimately converge towards v3. Of course, it can all technically be done in the same repo as the 2.x development, but it will make it much less clear what is 2.x and what is 3.x.
Just to give a single example related to that specific point: in the v3 repo, the meaning of "needs-backport" could mean "backporting to main". whereas in the v2 repo, it would mean "backporting to stable".
- Alibaba would make it clear that it's their contribution (they deserve it).
I really don't follow you, sorry. How using a branch would make it any different?
There is a bit more visibility in having an alibaba repo with a lot of churn and a lot of commits by Alibaba. It's really secondary, but again, I find it weird that they would have an alibaba
org on GitHub and not take advantage of it to showcase such a massive effort. Now, that's just me, I am not working for Alibaba and I don't have to tell them how to polish their brand ;-)
The alternative is to use a development branch on the main repository, like CCv0. That works as well, but may cause more "noise" initially for reviews. I already find it to be the case for CCv0 to some extent, despite our best efforts to identify that work clearly.
The "noise" is work being generated that we all should be committed to review. Having the "noise" generated in a different place, IMHO, may actually foggy the visibility of this effort.
To respond to your comment and @bergwolf at the same time: the noise in question relates to reviews of code that is not mature vs. code that is mature and needs to go in quickly for releases. Many developers filter patches differently based on whether they apply to the current release or whether it's a long-term review where many iterations are needed.
So by "noise', I'm not talking about "hype", but about ongoing disruption of the primary release tool for the project. I am trying to reconcile two objectives:
All in all, I am really sorry, but the points you raised are not clear to me, at all, and going to this direction would cause an extra load at least on the CI side, as mentioned on #3052 (comment)
The point in that comment is totally irrelevant, since it applies to different repos with no shared history. I am specifically suggesting a cloned repo, which IMO renders your line of argumentation moot. Please clarify if you think I missed something in your CI-related comment.
Quite to the contrary, I believe that some specific CI adjustments are hard to do on a per-branch basis. The baseline to use for v2 vs v3 is an example.
Bottom line, to me it really boils down to the following belief: I do not believe that reviewing the v3 code can or should be done with a single branch, and having many v3 branches in the primary repo will require a lot of conventions.
If we go with the single repo rout, may I suggest we need to discuss the following conventions:
v3/
(just the same, CCv0
could be renamed cc/v0
).v3/main
which would be a regular rebase of main
with whatever additional fixes are neededv3/
branchesbackport
means for v3/
branches (to main
or to stable-
?)v3
issues and PRsbreaksV3
label or something, and we need to separate the api-breakage
into "expected" (for V3) and "unexpected" (for V2 bugs).So yes, all this can be done before we start reviewing Alibaba's code. I frankly feel that a repository clone to get started is a tad bit simpler :-)
It would not make much of a difference for one branch. But since there are multiple developments going on, (Rust shim, sandboxing, rune, etc), I expect multiple topic branches to naturally emerge, not just one. So I'm looking for a way to group many related branches. The alternative would be a naming convention for branches, i.e. we could have v3/agent, v3/runtime, v3/shim and so on.
Now I see why you had these thoughts. IMO what you described (as "multiple topic branches") are pull requests rather than branches. And I think we can clarify that we only plan to maintain a single branch for the rust shim + sandboxing code. And there is no rune
(not sure if you mean the one that Alibaba Cloud uses for enclave containers, but it is certainly out of the scope).
So by "noise', I'm not talking about "hype", but about ongoing disruption of the primary release tool for the project. I am trying to reconcile two objectives: I want to see the code and be able to comment on it, with the assumption that it's still somewhat immature. I don't want that to interfere with (introduce noise in ) the mature code that we need to maintain and keep stable.
The code is being developed in a separate branch and will not be merged into the main
branch until it is mature enough. I don't see why it would disrupt the primary release tool for the project, which is supposed to live in the main
branch. So I'm not sure what is the interfere with the mature code
part about.
Bottom line, to me it really boils down to the following belief: I do not believe that reviewing the v3 code can or should be done with a single branch, and having many v3 branches in the primary repo will require a lot of conventions.
We have a single main
branch and so many ongoing features for kata in different pull requests. None of them is asking for new branches. Maintaining a single branch is actually easier than having multiple branches. With multiple branches, we'll have to spend more time syncing things between them and setting up CI for them. I don't think we should go for it.
Let's take a step back here and try to get everyone on the same page and understanding of how CI works, as I think it's not clear, and as it's essential for this discussion to be productive.
When we talk about CIs, we're talking about 2 things here:
GitHub actions: Those are the ones which are located here, and the currently tests running are:
These actions are inherited by the forks and depending on what secrets are used on each one of those, they may or may not work correctly on the forked repo.
Jenkins CI Those are ones which are located here and you can follow their execution here, and the currently jobs running are:
Those jobs are tied to a repo (in this case, kata-containers
), and to an organization (in our case kata-containers
), and we have the freedom to decide on which branches they run based on the configuration of each job itself. For instance, let's take a look at the jenkins-ci-ARM-ubuntu-18-04, there you have an allow-list saying that we run that job on main
and stable-2.*
branches, and a deny list saying that we don't run that job on master
and stable-1.*
branches.
Adding a new repo inside the same organization, would require adding and maintaining the new job, but adding a new branch would require tweaking the allow list. Adding those to a new organization is something that I do not know how much work would be, but I assume more than adding a new repo, as we might have to configure jenkins, we might have to configure new groups, new permissions, etc. But adding a new branch would mean adding a new field in the "allow list" mentioned above.
Everyone, please, have this in mind when discussing the effort needed when we think about the CI.
Now, with everything mentioned as part of https://github.com/kata-containers/kata-containers/issues/3052#issuecomment-979177240, I think we should target having both set of jobs running. But if we had to choose one to keep, I'd keep testing against the Jenkins ones, as those are the baseline for the Green CI effort and, although not perfect, the most complete thing we have.
I think we should keep both. Although at first, the new code may not be able to pass either of them, we should target having both of them working when merging the code back to the main
branch.
Now I see why you had these thoughts. IMO what you described (as "multiple topic branches") are pull requests rather than branches. And I think we can clarify that we only plan to maintain a single branch for the rust shim + sandboxing code.
I do distinguish topic branches from PRs.
A topic branch is something that is not mergeable yet, but where you invite review on a number of related commits, or even invite contributions from other developers to complete the work. It can be quite long-lived. CCv0
is our prime example today.
A PR is something that is ready to be merged. We sometimes use a "WIP PR" to advertise a topic branch. But you can have a PR to a topic branch, we do that all the time for CCv0
. A PR is not generally intended to be long-lived.
And there is no
rune
(not sure if you mean the one that Alibaba Cloud uses for enclave containers, but it is certainly out of the scope).
Sorry, I meant rund
. Too many run[a-z]
variants these days (runc
, rund
, rune
, runk
, etc). Though my only knowledge about rund
(like rune
) is mostly its appearance on a slide. Or is there no rund
either?
The point in that comment is totally irrelevant, since it applies to different repos with no shared history. I am specifically suggesting a cloned repo, which IMO renders your line of argumentation moot. Please clarify if you think I missed something in your CI-related comment.
@c3d, please, see: https://github.com/kata-containers/kata-containers/issues/3052#issuecomment-979177240
@fidencio I see why we reach different conclusions now. You are saying that it's easy to leverage existing jobs, and I agree with that. However, to my earlier question on the topic, @zmlcc replied:
Mostly based on our internal CI and fully tested in production environment
So my understanding is that the code as it exists today would not pass the existing Jenkins jobs. Or maybe it would, but that would not test the new pieces.
@zmlcc @bergwolf : this leads to three questions:
A topic branch is something that is not mergeable yet, but where you invite review on a number of related commits, or even invite contributions from other developers to complete the work. It can be quite long-lived. CCv0 is our prime example today.
Ok, in that case, the rust shim stays in a single branch just like CCv0.
Sorry, I meant rund. Too many run[a-z] variants these days (runc, rund, rune, runk, etc). Though my only knowledge about rund (like rune) is mostly its appearance on a slide. Or is there no rund either?
rund
is just the combination of rust shim and builtin sandbox.
So my understanding is that the code as it exists today would not pass the existing Jenkins jobs. Or maybe it would, but that would not test the new pieces.
We need to see what passes and what fails and fix up the failure part.
As for your three questions, I'll let @zmlcc answer them. But IMO the internal CIs should be an addition to the existing Jenkins job, not a replacement, if it can be run in a public environment.
A topic branch is something that is not mergeable yet, but where you invite review on a number of related commits, or even invite contributions from other developers to complete the work. It can be quite long-lived. CCv0 is our prime example today.
Ok, in that case, the rust shim stays in a single branch just like CCv0.
What about work on extensions, image acceleration or plugins? Would that all be lumped into the same big branch (let's call it v3
for clarity)?
Sorry, I meant rund. Too many run[a-z] variants these days (runc, rund, rune, runk, etc). Though my only knowledge about rund (like rune) is mostly its appearance on a slide. Or is there no rund either?
rund
is just the combination of rust shim and builtin sandbox.
That's more or less what I understood. So how do you plan to present that to the community?
To me, there are two really different needs in terms of review:
v3
).v3
but do not exist yet. Obviously, you cannot share a big branch for those, but I think it would make things really difficult for the rest of the community if these other efforts were not clearly delineated and organized. So I would expect topic branches like v3/image-acceleration
where PRs would merge features, not fixes or review comments.The way I see it, you are suggesting the two at the same time, and to me (at least in my brain), this conflicts.
What about work on extensions, image acceleration or plugins? Would that all be lumped into the same big branch (let's call it v3 for clarity)?
Those are nice to have but not the initial target for 3.0. And they should go in as PRs if they ever have a chance to be part of 3.0.
Reviewing code that you already put in production, but never made public. For this, we need to see the whole thing (therefore a branch), so that we can collectively decide how to improve the code before making calling that v3 (that would be small PRs that fix things, not merge them into main but into v3).
This is the main part.
Developing new features that we feel belong to v3 but do not exist yet. Obviously, you cannot share a big branch for those, but I think it would make things really difficult for the rest of the community if these other efforts were not clearly delineated and organized. So I would expect topic branches like v3/image-acceleration where PRs would merge features, not fixes or review comments.
The image-acceleration
part in Gerry's presentation is mostly for CC. So it may just target at CCv0
first, and it is actually beyond the scope of the proposal here (rust shim + builtin sandbox).
@gnawux Here are a few ideas and comments
The description seems to indicate that the shim reimplementation is well advanced already, and that the remaining work is sandboxing. Is that correct?
Assuming this is correct, was there any external repository for the development of that shim code so far, or was this all developed internally? If it was developed internally, was it in its own repository, or are there repository dependencies to Kata? Originally it's developed internally and is independent of Kata. But we are making plans to refactor the code for upstreaming.
When you talk about "passing the same test cases", do you mean you can run the existing Kata CI, or some other series of tests?
If there is a large body of code already, how do you suggest we should review it? It makes sense that the community would need to take time to review it. As @fidencio remarked during the meeting, all developers don't have equivalent familiarity with Rust and Go We are trying to split the code into pieces, so it could be easily reviewed and merged by upstream:)
Thanks for attempting such major work.
@fidencio I see why we reach different conclusions now. You are saying that it's easy to leverage existing jobs, and I agree with that. However, to my earlier question on the topic, @zmlcc replied:
Mostly based on our internal CI and fully tested in production environment
So my understanding is that the code as it exists today would not pass the existing Jenkins jobs. Or maybe it would, but that would not test the new pieces.
@zmlcc @bergwolf : this leads to three questions:
- Do you think that your existing CI for the new code is something that you could share publicly (e.g. could it be turned into Jenkins jobs running on our infrastructure)?
The Kata CI and our CI have many common parts, but also have some different interests too. So it definitely helps to share the common parts. We are glad to contribute CI test cases to the community, and it also be great to run the community CI system in our QA systems too if it's feasible.
- Do you think that there are reasonable chances that the existing code could pass our current Jenkins CI (most likely testing only the existing Go components which would be largely untouched)?
- Do you think that there are reasonable chance that the existing code could pass a large enough fraction of our existing CI using the Rust shim as a "drop-in" replacement (i.e. running the old tests with the new shim) ?
Need more investigation to figure out the status.
@fidencio I see why we reach different conclusions now. You are saying that it's easy to leverage existing jobs, and I agree with that. However, to my earlier question on the topic, @zmlcc replied:
Mostly based on our internal CI and fully tested in production environment
So my understanding is that the code as it exists today would not pass the existing Jenkins jobs. Or maybe it would, but that would not test the new pieces.
@zmlcc @bergwolf : this leads to three questions:
- Do you think that your existing CI for the new code is something that you could share publicly (e.g. could it be turned into Jenkins jobs running on our infrastructure)?
Surely Alibaba can share the internal CI, just like sharing the rust shim code. But in my opinion, we want to use the existing Kata CI to ensure that the new code is functionally consistent with the existing one. Ali's internal CI would not help this.
- Do you think that there are reasonable chances that the existing code could pass our current Jenkins CI (most likely testing only the existing Go components which would be largely untouched)?
- Do you think that there are reasonable chance that the existing code could pass a large enough fraction of our existing CI using the Rust shim as a "drop-in" replacement (i.e. running the old tests with the new shim) ?
No. Firstly our internal code development is independent of Kata, so we connot guarantee Kata's CI fits our internal code at all and vice versa. Secondly, we want to gradually release out the rust code. In the initial stage only some basic functionalities are included, which certainly cannot pass all the full-featured Kata CI.
Actually I suggest using seperate CI for the new code. As the new code development, we can gradually add test cases into the seperate CI until the new CI is consistant with the existing CI. At that time, we can say the new code is mature enough to merge into the main repo/branch
Apologies for being a bit late for the party. We've (containerd) just open sourced rust extensions for containerd that include crates for ttrpc/grpc clients, protos, shim, etc. Is this something you'd be potentially interested to consolidate efforts and maintain one crate?
It's an honor that we implemented this feature and it's up to me to close this issue.
Hello folks,
I am back:) (lol I have never left actually). I filed this issue as we discussed in the AC meeting this week.
In short, what I am proposing:
Now the detail:
Since we introduced the rust-agent in late 2019 and made it default in Kata 2.0, there are voices to re-write the shim in Rust from many contributors, e.g. #1071. In Ant Group (and our collaborators in Alibaba), we received the same requirement in order to save the host memory, especially for the high-density case.
In the past year, my teammate Fupan(@lifupan), the original author of shim-v2( kata-containers/runtime#572 ), and other contributors from our team and Alibaba Cloud team have eventually implemented a Rust version shim and made it pass the same test cases of the existing Go version.
In the meanwhile, as we have already had Rust-VMM, which has been proven works by Cloud-Hypervisor and Firecracker, and in our deployments, most of the Kata pods require only the basic sandboxing functionalities. It is reasonable to have a simple built-in sandboxing function in it and we did this eventually (most of the rust-vmm related works are contributed by the Alibaba Cloud team actually, who contributed to the rust-vmm and cloud-hypervisor as well). For complicated or sensitive cases, we may still use the Qemu or Cloud-Hypervisor like before.
The rust shim has been deployed in Alibaba and Ant Group's kubernetes clusters this year. As a result, we saved massive memory in the host and simplified the operating. And we believe it should come back to the upstream so that we can get further benefit from the community development and keep our stack up-to-date.
Xu