Open MaxymVlasov opened 9 months ago
/area localization /priority important-longterm /triage accepted /kind feature
For Transifex: if Transifex commits a change, is there any license or copyright asserted by Transifex?
(if not: I think we could make a tool that allows contributors to adopt those changes and confirm, through the tool, that their CLA applies to the contribution(s) in that commit).
Consider integrating Crowdin (https://support.crowdin.com/enterprise/authentication-settings/) along with Transifex for translation purposes. Crowdin offers a comprehensive set of authentication settings, as illustrated in the image below:
Additionally, Crowdin provides a versatile range of translation tools, enhancing the efficiency and engagement of contributors in the translation process. The image below showcases the various tools available:
Encouraging the use of Computer Assisted Translation (CAT) tools like Transifex and Crowdin can significantly streamline the translation process and boost overall contributor engagement. It would be beneficial for CNCF to explore and embrace these tools for a more efficient and collaborative translation experience.
I like it. Translation memory is always asset!
How about to introduce chatGPT that can provide better results than traditional machine translations.
Let's have one issue per change we need to make; if we'd like to have a wider discussion, see:
as repo discussions
@MaxymVlasov & colleagues: do you have a reply for https://github.com/kubernetes/website/issues/45175#issuecomment-1950248435 ?
Hello @sftim .
if Transifex commits a change, is there any license or copyright asserted by Transifex?
According to https://www.transifex.com/legal/terms/ the answer for quoted question is
Transifex does not assert any ownership rights over the content a person submit, including text published via git repositories for localization. The terms specify that while using Transifex services, you retain full ownership of your content ("your words"), and Transifex only requires limited rights to perform the services requested by localizator\customer. This includes actions like hosting, sharing, or otherwise processing your content as directed by you, without claiming any copyright or license over it
сс @MaxymVlasov @Andygol
@sftim pretty similar conclusion for CrowdIn mentioned by @Andygol https://support.crowdin.com/terms/
Clients are responsible for ensuring they have the necessary rights to their content. Crowdin asserts no ownership over client data, which includes text submitted for translation or any other purpose
cc @MaxymVlasov
For a discussing choice of tools, please go to: Translation tooling for Kubernetes localization
OK, so how about we document a workflow (I'm thinking git rebase
) where people adopt the changes that Transifex has committed, and then we look at automating away the toil from the manual process.
Does that sound OK?
@sftim
a workflow (I'm thinking git rebase) where people adopt the changes that Transifex has committed
The main blocker for a reasonable flow with Transifex is that easyCLA unable to process commits that include co-authors. if general, non-proven workflow description is good enough for the moment - we willprovide it
Broadly, here's what I suggest:
git push --force-with-lease
If we make a script, it would be a script to do the equivalent steps.
Why there is out of radar question about signing translation work with CLA inside Transifex/Crowdin/etc…?
And here the work of not only translators, but also reviewers and approvers, who don't have to worry about the complexities of managing content using git
commands, is made easier.
@sftim when Transifex (or any other LMP) suggest commit to pull it in main branch, a commit could already contain a result of a team work, not one author. Already reviewed and approved by owners on a localization management platform side.
Such way of managing of work progress saves a lot of time and effort and increase translation quality dramatically
so our plan is something like
UPD: This describes a way, where all authors signed CLA with easyCLA bot, not in localization management platform, it's closer to the current situation. If Legal and management agree to that localizators could sign CNCF CLA at platform (Transifex allows it for example) it would unblock process even more
I try to rephrase my question.
❓ Must we track the original contributors to translations on GitHub, or is it possible to accomplish the same task using a different platform, such as a localization platform?
When we localize manually, we don't list the original contributors as co-authors (and that's fine). No need to change when automating more of the process.
I'd say we take it as implied that localized text has the upstream (English) contributors as co-authors.
have a branch in /kubernetes-i18n-ukrainian/website
get Transifex to commit to that branch
pull a commit
use Transifex CLI\API to get list of all people related to the final translation in a commit
amend all related to the commit as co-author (as commit is already owned by Transifex bot, which should be approved by EasyCLA bot)
push the commit to /k/website
easyCLA recognizes all authors and is good with it
Although you're welcome to work in kubernetes-i18n-ukrainian/website
, we'd prefer to support you so that this work happens within the Kubernetes organizations on GitHub. If you (the Ukrainian team) feel that there are barriers to working within Kubernetes, please tell us about what feels difficult. We'd like to address those barriers rather than ignore them.
Beyond that, change one more detail, and it could work:
-amend all related to the commit as co-author (as commit is already owned by Transifex bot, which should be approved by EasyCLA bot)
+change that combined commit to have the pull request submitter as the primary author, and list all other human coauthors as co-authors
If Legal and management agree to that localizators could sign CNCF CLA at platform (Transifex allows it for example) it would unblock process even more
I'm pretty sure we'd need people to sign with the Linux Foundation, or for their employers to sign (again, via LF). Using Transifex for that signing does not sound feasible; the CNCF CLA signing and tracking is something that applies across the Linux Foundation.
As another take on it: does Transifex provide a way for us to only accept translations from users who have signed the CLA at the Linux Foundation, and to reject work from anyone who doesn't have a current CLA?
We need to watch out for tainting: if a commit that adds or updates a localized document has any change that isn't covered by a CLA, we can't accept that work. Even if most of the work was done by other people.
As another take on it: does Transifex provide a way for us to only accept translations from users who have signed the CLA at the Linux Foundation, and to reject work from anyone who doesn't have a current CLA?
It provides ability to force to sign CLA provided by org owner before any work will be started. If EasyCLA will be integrated with CLA in Transifex/etc - then answer is yes.
Then we can avoid any CLA verifications and commit co-authors manipulations on git side, as it will be done during sign-up procedure in choosed TLMP
The best we can do then is to ask people to confirm that they have signed the CLA and require this confirmation. We can't rely on the signature made within Transifex, and I don't expect that an EasyCLA integration is going to happen.
Folks are welcome to ask the CNCF for this, but let's build something that'll work even if we don't get it.
In other words, we don't ask people to use Transifex to sign the CLA. We ask people to use Transifex to formally confirm that, if we check their CLA status, it will show up as signed.
Then we can avoid any CLA verifications and commit co-authors manipulations on git side, as it will be done during sign-up procedure in choosed TLMP
Although it's a nice idea, I also think it's safe to assume that it won't happen; I can't picture a source of money or resources to make it work how we'd like.
What we can do is start with a manual process like I've outlined, and then automate that more.
Hi everyone – I wanted to update this discussion with a couple of things so that we can track progress on some of this work.
SIG Docs leadership is trying to have the LF/CNCF make and communicate a decision about the following: Are they willing to do a foundation policy change where external tools can do CLA signing, bypassing EasyCLA. This is an ask that the Ukrainian Localization Team has as part of the workflow they envision when adopting a translation and localization platform. Maksym has already raised an LF Service Desk ticket for this, and I have personally reached out to/tagged Robert Reeves from the Linux Foundation, and most recently @jeefy, Head of Projects for CNCF.
In parallel, I know that @sftim is looking to work with others on starting a PoC on what a workflow could look like for an adopted translation/localization platform, but one that specifically doesn't include an EasyCLA integration – this is because we have yet to receive any decision or confirmation on the above ask, and if the policy will not be changed, we can still look into improving the localization workflow for language teams.
Finally, I'll be looking to connect with the Steering Committee (via our liaison @justaugustus) so that whatever decision is recorded is done so for the whole project.
Thank you @natalisucks for doing the heavy lifting! I just have a couple of requests for the effort that @sftim plans to collaborate on:
:coffee: Okay, here are my thoughts/questions:
If the tooling is creating follow-up PRs to something already merged in, said-tooling does not need to do a CLA-check. We already do that check up front, no reason to duplicate it.
It's a follow up in that the English content has merged, but the (eg) Ukrainian content may well not have (or there may be a substantial update). We've so far treated localization work as copyrightable.
We've so far treated localization work as copyrightable.
Authors maintain the copyright since this repo's licensed under CC 4.0. Which makes sense. And with Transifex, they clearly and definitively say that the author retains the rights.
To that end, I would move forward with getting a proof of concept in place (with Transifex or something else) that submits PRs and ensure actual-humans still review the content. But, it cannot be put into "production" or have any of said-PRs merged in yet (If I could color this red I would)
If a Bot submits the the PR, who owns the copyright?
The bot?
The owner of the bot?
The person who configured the bot?
Or the person who approves/merges the PR?
Before we merge any of the automated-translations, those questions need to be answered by the legal committee. I've escalated it up (See this reference) but fair warning we're in the shadow of KubeCon, the legal committee won't likely get to this till April. Just setting an expectation there.
Hopefully this unblocks y'all a bit.
@jeefy I would appreciate it if you take a look at this discussion as well — https://github.com/kubernetes/website/discussions/45209
In parallel, I know that @sftim is looking to work with others on starting a PoC on what a workflow could look like for an adopted translation/localization platform, but one that specifically doesn't include an EasyCLA integration
@natalisucks easypeasy! without EasyCLA compliance confirmation it could take like no days to start work on translation of a source stored on git last year we stuck on EasyCLA bot, which still needs to be changed to move forward
The only point that considers me is trial session is limited in time ((after trial we need to switch to a paid plan or complain to a platform "agreement" as open source)), so it would be great to have all volunteers to work on localization via Transifex (or any other TLMP) before the start
So we could go through the test and collect cons\pros opinions as fast as possible
Let's set the deadline and collect list of volunteers from different localization teams
PS
and I have personally reached out to/tagged Robert Reeves from the Linux Foundation
Ukrainian Localization team had personal reached Robert Reeves before raising AGAIN this question again with SIG Docs leadership to clarify that we have some support on CNCF side this time for changes to improve, modernize, streamline localization process this time
@MaxymVlasov I think it could be a great moment to summarize what we've discussed, what we have on the table at the moment and etc.
@jeefy @sftim once again in this story we have three types of bots:
From a day one of this topic discussion those three are mixed and it's a trouble. It's a HUGE trouble, cause the whole discussion is mixed again and again, from what I can see
We as Ukrainian Localization Team started this topic as let's-integrate modern localization MANAGEMENT tools into the process. Most of them, if not all have PR bots in their flow, as it was logical and understandable decision on platform developers side. These bots do not create texts\code. Their are only parts of a CI like process They are simplifying SENDING of translated and reviewed code into a git storage (in our case) Texts they send owned by contributors and reviewed by contributors on TLMP site before PR is created
Easiest way to exclude those from the process is to create own TLMP, which doesn't look reasonable
The best we can do then is to ask people to confirm that they have signed the CLA and require this confirmation. We can't rely on the signature made within Transifex, and I don't expect that an EasyCLA integration is going to happen.
If the tooling is creating follow-up PRs to something already merged in, said-tooling does not need to do a CLA-check. We already do that check up front, no reason to duplicate it.
We have numerous bots/automation throughout our projects. Those bots all have CLAs signed by their creator-or-representative. Since this is a third-party-tool and not a community-built tool, there are some differences there.
We haven't found a platform that has an EasyCLA bot in their mind as a source of CLA sign verification. However some of these platforms have their own CLA sign \ sign verification mechanisms. Integrating them will save tons of time and effort into localization process Yes, in this case we will get more bots\mechanisms to take care of, software developer partners to discuss needs \ changes with, etc, etc, ... However I don't see how we could do better at the moment if we don't own one of those platforms by ourselves and EasyCLA bot is too limited to allow us any reasonable automation to combine TLMP with current processes even with extra steps The re-use of the already existing and globally recognized TLMPs is definitely a Pro and eliminates a Con of adding another CLA verification mechanism. We are not creating new mechanisms, we wont to integrate existing which are here for dozens of years
Localization automation bots could be a game changer if we are talking about bringing a kickstarter to a community of speakers who don't have their localization team yet, or for those who are short on hands. However it's not the main priority for us (at least for the Ukrainian localization team). We are looking for tools that will make it easier for new people to get on board and reduce the technical effort to start working on the localization
I am grateful for the patience of all who are going through these thoughts again, and hope that they will clarify the case if you are reading them for the first time.
@jeefy I would appreciate it if you take a look at this discussion as well — #45209
A bunch of those solutions look pretty cool IMO. I don't really have a say in the approach, but my two cents would be to try and do as much as possible without the need to install a tool locally (looking at the suggestion of some desktop app)
last year we stuck on EasyCLA bot, which still needs to be changed to move forward
I can poke at this internally but I have a feeling it will not get done any time soon. Please do not do anything that requires it.
Ukrainian Localization team had personal reached Robert Reeves before raising AGAIN this question again with SIG Docs leadership to clarify that we have some support on CNCF side this time for changes to improve, modernize, streamline localization process this time
We (CNCF) fully support projects looking to do this sort of automation, but that's it. If there's some resource we can provide to streamline things, we're happy to. But we avoid prescribing any single solution. We prefer to leave it up to the projects to pick what works best for them. Robert works to build partnerships so once a solution is settled on, he'll hop on board to try and work with the vendor. He's not going to help y'all decide.
So we could go through the test and collect cons\pros opinions as fast as possible Let's set the deadline and collect list of volunteers from different localization teams
I might be out of place but that's probably best decided, prioritized, and left to the Chairs/TLs of this SIG :) They've got a lot going on (some of it because of me, sorry!) so patience would be welcome here.
I can poke at this internally but I have a feeling it will not get done any time soon. Please do not do anything that requires it.
From my perspective the best thing which we could do to not rely on CLA verification of EasyCLA bot (because bot needs to be modified, to do a verification with extra steps. commit with co-authors scenario) is to move to a scenario where CLA verification is done on TLMP side 🤷
We will get one extra CLA related tool, but will not have to do extra steps. The only con i see - this is a less agile scenario
Here is a quick summary of the current situation ¯\_(ツ)_/¯
graph TD
subgraph C["CNCF/LF — SIG Docs"]
CNCF[CNCF/LF] -->
|"We fully support projects<br />looking to do this sort of<br />automation.<br />We avoid prescribing any<br />single solution"| SD[SIG Docs] -->
|"Are CNCF willing to do<br />a foundation policy change<br />where external tools can do<br />CLA signing, bypassing<br />EasyCLA?<br /><br /><br />We have yet to receive<br />any decision or<br />confirmation on<br />the above ask, and<br /><br />if the policy will<br />not be changed…"| CNCF
SD --> SD
end
subgraph LT["Localization Teams"]
M((("We need modern<br />Localization Platform")))
end
LT --> C
How about: we set up a trial to use Crowdin to localize some Ukrainian pages, and as part of that we make a process to get the content published. It's likely that people would need to sign the CLA once but that there would be more than one place where our systems verify this.
Once we have that prototyped, we can look at adapting the process so it also works with Transifex.
Does that work?
Ukrainian Localization team had personal reached Robert Reeves before raising AGAIN this question again with SIG Docs leadership to clarify that we have some support on CNCF side this time for changes to improve, modernize, streamline localization process this time
If there's no communication with SIG Docs about being blocked on something, then we won't know 🙂 Our first try was to have the Ukrainian team raise a ticket with LF about the CLA bypass, which happened. To my knowledge, we've only heard about this not being resolved for the last ten months in these fresh conversations (in Slack and on this issue), hence our next attempt to escalate. Please work with us here, we are all on the same team 👍
Do any of these platforms allow webhook / HTTP callout authz?
If we find one, we can link that to CLA checking - that's engineering, but that's good because implementing software has lower barriers than changing copyright policy. The outcome would be to help contributors discover as early as possible that they (or their employer) must sign the CLA.
How about: we set up a trial to use Crowdin to localize some Ukrainian pages… Does that work?
It would be great to try it out. I'm opt-in!
Here is https://github.com/Andygol/k8s-website/tree/main-uk-wip 760+ already translated into Ukrainian Docs, in case we need to test on something.
Do any of these platforms allow webhook / HTTP callout authz?
✅ https://support.crowdin.com/webhooks/ and ✅ https://developers.transifex.com/docs/webhooks
If there's no communication with SIG Docs about being blocked on something, then we won't know
@natalisucks plz, stop refer to "If there's no communication, then we don't know" stuff. it brings up... a lot of emotions from last year communication around topic, and most of those are not pleasant.
So let's just not touch it for some time, until things really wouldn't start rolling :handshake:
Do any of these platforms allow webhook / HTTP call out authz? If we find one, we can link that to CLA checking
@sftim could you Plz tell more about what you are looking for specifically? platforms could work with hooks, so we could at least to try to go your way, however I am not sure that we are looking for similar workarounds
From what @Andygol , @MaxymVlasov and I have looked through previously, the EasyCLA bot limitations are on the road. But whatever: maybe you have something on your mind that we haven't looked into yet. Plz share with us
@sftim could you Plz tell more about what you are looking for specifically? platforms could work with hooks, so we could at least to try to go your way, however I am not sure that we are looking for similar workarounds
That's a specific detail of using a translation and localization management platform. How about tracking that part in its own issue?
@rolfedh proposed a testing plan on the discussion about which tools to consider.
I'll copy it here:
To evaluate the effectiveness of translation platforms like Crowdin and Transifex for Kubernetes localization, consider implementing a phased plan:
Define Evaluation Criteria: Establish key performance indicators (KPIs) for success, such as ease of use, integration capabilities with GitHub, support for managing contributions and CLAs, quality of translations, and efficiency in tracking changes between the original documents and translations.
Select Pilot Projects: Choose a small, manageable set of documents for initial testing. Ensure these documents represent various types of content in the Kubernetes documentation to fully assess the platforms' capabilities.
Set Up Platforms: Configure both Crowdin and Transifex with the selected documents. This includes setting up projects, integrating with GitHub (if possible), and inviting a small group of volunteer translators to participate in the process.
Conduct Training Sessions: Provide training for the volunteers on how to use each platform, focusing on the specific features and workflows relevant to the Kubernetes project.
Begin Translation Process: Allow volunteers to start translating documents using both platforms over a defined period, encouraging them to note their experiences, challenges, and feedback.
Gather Feedback and Assess Performance: Collect feedback from volunteers regarding their user experience, the quality of translations, and any issues encountered. Evaluate this feedback against your predefined criteria.
Review and Analyze Results: Analyze the performance of both platforms based on volunteer feedback and the KPIs. Consider factors like user satisfaction, efficiency improvements, and any reduction in translation errors or inconsistencies.
Make Recommendations: Based on the analysis, recommend whether to adopt one of the platforms (and which one), continue testing with a broader set of documents, or explore other options if neither platform meets the project's needs.
This plan focuses on a collaborative and data-driven approach, ensuring the chosen solution not only meets the technical requirements but also enhances the translation experience for volunteers.
Also:
Here's a proposed list of Kanban tasks for our testing plan:
Define Evaluation Criteria:
Select Pilot Projects:
Configure Platforms:
Recruit Volunteers:
Conduct Training Sessions:
Kickoff Translations:
Monitor Progress:
Collect Feedback:
Analyze Results:
Make Recommendations:
Presentation to Stakeholders:
Ensure each task is assigned to a team member with clear responsibilities and deadlines. This structured approach allows for continuous monitoring, adjustment, and evaluation of the platforms' effectiveness in streamlining the Kubernetes documentation translation process.
I've created an issue that covers testing for these tools, https://github.com/kubernetes/website/issues/45756. If you are interested in volunteering to help test the tools, please review reach out on the issue.
@MaxymVlasov Can you please give us an update on the progress of your Service Desk ticket? I'd like to make sure we're following up with the CNCF accordingly.
Furthermore, for folks still following this issue, we're still accepting people to help out with #45756 so that we can get some testing started, without CLA checking or integration, on some possible tooling.
@jeefy If you wouldn't mind letting us know if there's been an update from the Legal meeting with LF folks on the topic of bypassing the CLA bot (see my comment here) that would be great.
@natalisucks LF helpdesk escalation doesn't work...
@MaxymVlasov Thanks for the update. Since we're specifically looking for a decision about bypassing CLA without a specific tool in mind, I think it's worthwhile for us to pursue this question separately with @jeefy and the LF Legal team. As you know, tooling has not been selected yet, so we don't want this to apply to Transifex alone, if the decision is that a bypass can happen. Again, given that this is a change that affects the whole foundation, and is a legal question, it could take time to get an answer, but we are trying so that the project can move forward.
In the meantime, please do not let this stop yours, @OleksaBaida's, and @Andygol's desires to test tooling for translation workflows – regardless of the CLA answer, an integration could still be really useful.
We'll wait to hear back from Jeff about the Legal team's response.
Good news - EasyCLA now checks CLA for everyone in commit: https://github.com/kubernetes/website/pull/45174#issuecomment-2118025718
https://docs.linuxfoundation.org/lfx/easycla/v2-current/easycla-and-co-author-compliance-guide
Btw, https://hosted.weblate.org/ provides ability to get suggestions for similar text from other projects, which simplify localisation and it standartization across whole language.
Usually they cost money, but for OSS project they can provide free license.
Just in case if other solutions will not work fine
This is a Feature Request
Support of Translation & Localization Management Platform (TLMP) for docs
To effectively make and maintain translations, we need to adopt tools that were created for it. There a bunch of solutions, like Transifex, Crowdin, and others.
After a quick research at the start of 2023, we chose Transifex for PoC as a free and reliable solution, but in the end, it's no matter what tool sig-docs chooses, it will be much better than the workflow that we have now via git&Github.
What would you like to be added
There are 3 ways, from better to worse:
k/websites
and move localization work fully to the chosen TLMP, including docs reviews and CLA sign. Add TLMP Github integration and auto-merge changes sent by TLMP integration user (provided by TLMP or you can setup your own)Why is this needed
When you write docs or code in 1 language, you work only with the current state of docs - everything that you need to track is tracked by git. When you translate something - you deal with 2 sources of truth: original and existent translation, which can be partial, outdated or placed in locations that have already been moved/removed from the original. + other edge cases. There is no easy way in git to check what changes in the original also should be revisited in translation, so mostly every translated doc became unsupported from the moment when it was merged - as sometimes simpler to redo a translation from scratch than to figure out what changes are needed.
Long story short: git is just the wrong tool for translations. It is as bad for this as using .zips for VCS or trying to send a letter by pigeon mail to another continent and hoping for 3 workday answer.
Also, tech-writers, students, or newbies which'd like to contribute, in most cases have no or little knowledge how git and GitHub works, they don't have a GitHub account, and so on. Those are just not intuitive tools for non-techie folks.
So, what good enough TLMP will provide:
not translated
, andnot reviewed
strings/filesComments
@sftim
asked to add it as an issue here, to be able to track work on it.Related Linux Foundation issue
Screenshots of the LF issue in case you can't see it
Read msgs from bottom to top ![Screenshot from 2024-02-16 19-35-04](https://github.com/kubernetes/website/assets/11096782/7c57b71e-5153-4ca8-a6a3-8d0279b9d31b) ![Screenshot from 2024-02-16 19-34-48](https://github.com/kubernetes/website/assets/11096782/97f0cacb-c278-4d44-b493-2eb0882b8c8a) ![Screenshot from 2024-02-16 19-34-31](https://github.com/kubernetes/website/assets/11096782/ca0b2085-5263-4cf2-b1ba-c92adf2a4e98)P.S. That started as a Ukrainian localization team initiative, but we were blocked from the legal perspective of merging back that kind of change from TLMP.