Closed Bobgy closed 1 year ago
/cc @kubeflow/wg-training-leads @tizhou86
Plan:
Note, I cannot rename the existing github repo directly due to policy issues, so issue transfer etc is not possible. Luckily, we don't have many issues yet: https://github.com/PaddleFlow/paddle-operator/issues
That's my rough plan. Any concerns?
For license requirement
@tizhou86 May I confirm, is all the code in https://github.com/PaddleFlow/paddle-operator currently wrote by your company? Are there snippets or files copied from other projects?
If you have copied snippets, it's not a big problem, but you'll need to move them into a "third_party" folder at the root of the repo and mention original copyright and license, example: https://github.com/kubeflow/pipelines/tree/master/third_party.
The copyright headers do not reference a company name. @tizhou86 do you want to update? example:
Also can you add license headers for all the files (including yaml)? You may use https://github.com/google/addlicense to add licenses easily.
For new open source projects, we will be enforcing inclusive language policies. I found that there are several occurrences of "master" in the repo: https://github.com/PaddleFlow/paddle-operator/search?q=master.
reference:
Can you investigate whether it's possible to avoid using "master" in code and API?
These are problems I found for the initial pass.
Hi Bobgy, I'll talk to some of our team members and resolve all the problems you mentioned asap, thanks!
/cc @kuizhiqing
Hi @Bobgy , All mentioned problems above are resolved in PRs: https://github.com/PaddleFlow/paddle-operator/pull/48 and https://github.com/PaddleFlow/paddle-operator/pull/49 , please let us know if anything is needed from our side, thanks!
@tizhou86 thanks! LGTM except some minor problems:
Another Q: did anyone file any patents around paddle-operator?
Hi @Bobgy , No, no patent is filed for paddle-operator now.
Hi @Bobgy , I've add license for the yaml files with https://github.com/PaddleFlow/paddle-operator/pull/53 . BTW, can you also add me as OWNERS in third step of your Plan.
Hi @Bobgy , I've add license for the yaml files with PaddleFlow/paddle-operator#53 . BTW, can you also add me as OWNERS in third step of your Plan.
Hi Bobgy, please also add kuizhiqing as the owner of this project, thanks! :-)
Sure, updated plan
A required process after paddle-operator is migrated to Kubeflow org: All released images should comply with open source licenses of its dependencies: https://github.com/kubeflow/community/blob/master/guidelines/application_requirements.md#docker-images.
There were some documentation for how this can be done for go binaries in https://github.com/kubeflow/testing/tree/master/py/kubeflow/testing/go-license-tools. However, I wrote that tool a while back, now I'd recommend a new license tool I just built, refer to https://github.com/Bobgy/go-licenses/tree/main/v2.
I'm not opinionated on the choice, you can decide. Feel free to ask me if anything is not clear.
@Bobgy Thanks for the update. Is it project owner's responsibility or WG's responsibility to make sure license files are included in all the images? Do you know if there's way we can scan licences just like vulnerability scanner for docker images?
Hi @Jeffwan, I'd say it's primarily project owners' responsibilities.
I am not aware of such tools like vulnerability scanning, because it's a little hard understanding how each binary was built.
So far existing tools are around integrating license compliance process in dev/release process.
Hi @Bobgy @Jeffwan , thanks for your clarification. We are working on it now.
Hi @Bobgy , I've add licenses file in the repo and copy it into image, cf pr https://github.com/PaddleFlow/paddle-operator/pull/73.
For the image issue, kubebuilder provide us with base image gcr.io/distroless which we are replacing with bitnami/minideb:stretch due to the accessibility.
Is it ok or maybe we should use registry.access.redhat.com/ubi8/ubi as base image just like pytorch-operator did ?
@kuizhiqing What's your plan on kubeflow/paddle-operator? Seems you still use PaddleFlow/paddle-operator as main dev repo?
Should we aim to migrate the paddle-operator code base to the unified operator instead of standalone new repo?
Hi all, we are aware of your advance in working on unified operator, we would like to making paddle work under unified operator, though, we are now working on repo paddle-operator and it's on the process of adoption. I'm considering that shall we possibly continue the process and plan to work with unified operator later ?
@kuizhiqing Please have a look at tf-operator code base and see if that's possible to make it easily. We heavily reuse codes from kubeflow/common
Hi @Jeffwan , I've read your recent code in tf-operator, it make sense for paddlepaddle to take a place in it but I am afraid maybe we can do it later since it is not straight forward to make it right away.
I'm not opinionated on whether @kuizhiqing wants to merge paddle operator into the unified operator directly. We can start with a standalone repo and consider the merge too.
For the PR that adds license info to the image, I just did a quick review.
For the image issue, kubebuilder provide us with base image gcr.io/distroless which we are replacing with bitnami/minideb:stretch due to the accessibility.
What do you mean by accessibility? Is it because gcr.io is not accessible from China?
@kubeflow/wg-training-leads as you may understand, Training WG should be responsible for security of paddle-operator. Can we take one of the options below?
This ensures that when something happens to paddle-operator images (just assuming extreme cases), Training WG leads also can take actions.
Hi @Bobgy , yes gcr.io is not always reachable from China.
So for base image, is bitnami/minideb:stretc just ok or we should use gcr.io/distroless or registry.access.redhat.com/ubi8/ubi instead ?
For container registry, we can push our images in the same registry as other images, please help with the permission.
Understood, I don't think we have a strong opinion on the base image. At the end, users will only care about whether there can be vulnerabilities in the base image, I think you can decide by yourself based on this.
Hi @Jeffwan @gaocegege, what's your opinion on https://github.com/kubeflow/community/issues/520#issuecomment-915209894? This is the remaining blocker.
I think we could give admin permission. The security issues can be guaranteed by code reviews. WDYT @Jeffwan @johnugeorge
BTW, prefer using the same base images with other training operators. It can be easy to maintain.
@gaocegege Just to double confirm. Do you mean you prefer second option and would like to grant permission to paddle maintainers? Our training operator image is hosted in AWS ECR and we use our own CI pipeline to build images.
We also own kubeflow dockerhub but that's a free account with rated limited issue
We also own kubeflow dockerhub but that's a free account with rated limited issue
I mean that we can grant docker hub permission.
@kuizhiqing WDYT
We also own kubeflow dockerhub but that's a free account with rated limited issue
I mean that we can grant docker hub permission.
@kuizhiqing WDYT
OK, it works for us.
Hi @@kubeflow/wg-training-leads @Bobgy @gaocegege @Jeffwan, as you may note that we have update our image register (pr) which is hosted on baidu cloud, and we will grant permission for you.
If that is OK with you, we may continue the process.
I don't have a strong opinion on the registry as long as Training WG has permission and can take actions when necessary. Any objections from others?
I have no objection to it.
@Bobgy @kubeflow/project-steering-group I believe we can start the transfer. What steps are involved for transferring the repo over to Kubeflow org? I believe @kuizhiqing has to give owner permission of https://github.com/PaddleFlow to @kubeflow/project-steering-group?
Hi @Bobgy Please let us know if we need to add owner permission for kubeflow project steering group mailing list, thanks!
Ping @kubeflow/project-steering-group. Let's move forward with this. Please provide instructions on the next steps.
Also cc @zijianjoy
Since @Bobgy is no longer with the team, @zijianjoy would you be able to take over from here?
Thanks for the reminder! @theadactyl and @kramachandran in @kubeflow/project-steering-group will be the main POCs for this.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/assign @theadactyl
/lifecycle frozen
Any update about the paddle operator?
Since we have a unified training operator, we don't need a separate repo
Because we accepted https://github.com/kubeflow/community/pull/502 paddle operator proposal.
Now we can start the process to adopt the external repo.