Closed roadlittledawn closed 2 years ago
@rudouglas
Sorry for the churn on this. Thinking through this is tricky stuff!
TL;DR: Perhaps adding that info via frontmatter is the way to go. I can't think of other big hang-ups to not do it.
@roadlittledawn Not at all, tis a wild beast of an epic, we must be sure to tame it right. I'll detail what I was thinking here just for full context.
One other separate thing we have to determine actually is what we do in the following scenario:
@rudouglas yep, that flow makes sense to me. one [nit] though if i may. can we change the name of the frontmatter field to something like translatedBy
or translationType
?
Loving these collapsers π ye that makes it more obvious i like translationType
That's exactly the scenario I was thinking of yeah, it's probably rare enough to not need to worry about it right now and just have some kind of warning. The simplest solution would be to just have a comment in the exclusions.yml
file to remind people to check, but is there an easy way for them to check? If we need to write code to implement a warning I would argue that we might as well code the check into the automation anyway, it should be very similar to the code adding it to the queue anyway and we can write it with that use case in mind.
Either way translating a couple of files that should have been excluded isn't really a big deal, the bulk of the translations will be done in the initial run so as long as the exclusions are correct at the start this might not be something we need to consider right now
Summary
Similar to #2536 the interaction with the Smartling API is identical for both Machine/Human Translation. The deserializing method should also not change apart from the additional considerations.
In the current script called in the workflow check-job-progress.js we are picking up the
projectId
as an environment variable, so as we are creating a separate workflow for Machine Translation we can set the env variable there. This way we should just need to add a flag when running the script to tell it how to deserialize the data. Something like:This will need to be passed through to...
Fetching
We currently fetch translations using the $PROJECT_ID, so this shouldn't need to change in fetch-and-deserialize.js. We will just need to add logic for when we pass through the
--machine
flag fromcheck-job-progress.js
when...Deserializing
When we fetch the completed translation doc, we will need to add
frontmatter
to each translated doc to signify that it has been Machine/Human Translated. Using the--machine
flag we will be able to tell which frontmatter to add. (This will be needed to determine whether to show the Disclaimer as part of https://github.com/newrelic/docs-website/issues/2537)E.G.
Currently, once we download a translated doc we then strip the
translate
key:value out of the frontmatter so the method for editing the frontmatter would be largely the same, (see here)Accounting for
project_id
The functionality for updating the tables with completed job status should not change, the
project_id
should already be in the tables and we are just updating the status column. But it would be good to double/triple check that.For Downloading and Deserliazing, the relevant scripts are:
π Testing Scripts
You can use the MT Project ID for this but only test with a 1 word change, [see here](https://github.com/newrelic/docs-website/blob/develop/scripts/actions/translation_workflow/testing/README.md#make-a-change-to-translate) The reason for this is we have a `2 million` word limit per year specifically for Machine Translation >Average is about 850 words per document. total is about 1.6 million. For MT itβs a total of 2 million words can be translated over a year. we have approximately 1800 pages x 850 (avg word count per page) = 1.6 millionAcceptance criteria
translated: machine/human
to all pages after fetching to signify how it has been translatedproject_id
column in tables