Closed johnml1135 closed 2 weeks ago
machine/jobs/build_word_alignment_model.py
line 7 at r1 (raw file):
This should be a relative import.
Done.
machine/jobs/engine_build_job.py
line 13 at r1 (raw file):
I don't know how much this class buys us, since it doesn't really have any logic in it. It just provides a skeleton. I am also not a big fan of passing variables between methods using class fields. It makes the dependencies between the methods less explicit and more error-prone. If we do use this class, we should provide different implementations for MT and word alignment.
Renamed to TranslationEngineBuildJob - and removed it from WordAlignmentBuildJob.
machine/jobs/engine_build_job.py
line 47 at r1 (raw file):
All of these "protected" methods should be prefixed with an underscore.
Done.
machine/jobs/nmt_engine_build_job.py
line 113 at r1 (raw file):
Why are you ignoring the type checking here?
Changed the typing check for write - to just look for an object.
machine/jobs/settings.yaml
line 30 at r1 (raw file):
We will probably want different settings for the SMT engine and the word alignment models. We should add a new section, maybe `thot_align`.
Done.
machine/jobs/shared_file_service.py
line 41 at r1 (raw file):
I would define separate classes for MT and word alignment. You can still define a base class if there is some shared code.
Ugh - took a while.
machine/jobs/word_alignment_build_job.py
line 84 at r1 (raw file):
What is type checking being ignored here?
There was a typo - I found it and it resolved the core issue.
machine/jobs/smt_model_factory.py
line 14 at r1 (raw file):
The `SmtModelFactory` class is intended to be decoupled from any specific SMT implementation, such as Thot. Also, you should define a separate word alignment model factory.
Done.
Attention: Patch coverage is 80.13356%
with 119 lines
in your changes missing coverage. Please review.
Project coverage is 88.00%. Comparing base (
3912c6a
) to head (9567830
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
machine/jobs/build_nmt_engine.py
line 61 at r3 (raw file):
It would be better to use the enum value for the type instead of the string.
Done.
machine/jobs/build_word_alignment_model.py
line 58 at r3 (raw file):
It would be better to use the enum value for the type instead of the string.
Done.
machine/jobs/translation_engine_build_job.py
line 51 at r3 (raw file):
This is hacky. As I stated before, there are issues with passing parameters between methods using class fields. You can create these objects at the beginning of the `run` method and pass them as parameters to the appropriate methods.
I moved them to properties. That appears more pythonic and saves having to move the source, target and parallel corpuses around.
machine/jobs/translation_file_service.py
line 7 at r2 (raw file):
You missed a couple of relative imports.
Found all and fixed them all.
machine/jobs/translation_file_service.py
line 25 at r3 (raw file):
Do we need to support a string value? Just the enum should be sufficient.
Changed to constants.
machine/jobs/word_alignment_build_job.py
line 55 at r3 (raw file):
See my comment in the `TranslationEngineBuildJob` class.
Also updated.
machine/jobs/thot/thot_word_alignment_model_factory.py
line 18 at r2 (raw file):
We will probably want a different new model template for word alignment models. The SMT template contains many files that are specific to SMT models. Should I provide you with a new alignment model template?
Yes, please. I wouldn't know what was needed (unless I spent many hours).
machine/jobs/thot/thot_word_alignment_model_factory.py
line 42 at r2 (raw file):
You should use the model type from the `thot_align` section.
Done - and changed it over for the tokenizer as well.
I believe the last thing is the new word alignment template.
machine/tokenization/tokenizer_factory.py
line 28 at r2 (raw file):
These functions should be exported from the `tokenization` package.
Done.
machine/jobs/build_word_alignment_model.py
line 88 at r4 (raw file):
This return value doesn't seem to be used anywhere. Are you returning this intentionally?
It's still a work in progress - I'll make a few more changes.
machine/jobs/translation_file_service.py
line 25 at r3 (raw file):
The constants are a good change. I was actually referring to the `type` parameter. I don't think we need to accept a `str`.
I believe that we need to based upon them being defined in SETTINGS.
machine/jobs/translation_engine_build_job.py
line 51 at r3 (raw file):
The `create_source_corpus` and `create_target_corpus` methods actually download files from the S3 bucket. It would be unexpected behavior and a code smell for a property to perform long-running processing. Also, you shouldn't reference fields by their name, since this can easily result in a bug if the field name is ever changed. The easiest way forward is to remove this class and create the corpora in the `run` method of the `SmtEngineBuildJob` and `NmtEngineBuildJob` classes. This class adds complexity without adding much value. The other option is to create the corpora at the beginning of the `run` method and pass the variables to the appropriate methods.
Makes sense - updated.
machine/jobs/build_word_alignment_model.py
line 88 at r4 (raw file):
It's still a work in progress - I'll make a few more changes.
I'm slimming it down a bit.
machine/jobs/settings.yaml
line 31 at r5 (raw file):
I know I told you to create a `thot_align` section, but I'm still not happy with the way that the settings are structured. I don't think the section names are clear enough. What about if we change the `thot` and `huggingface` sections to `thot_mt` and `huggingface_mt`. That would clearly differentiate the MT configs from the alignment configs. Or we could have separate `mt` and `align` sections under the `thot` section.
Done.
machine/jobs/translation_file_service.py
line 25 at r3 (raw file):
It looks like all calls to the `TranslationFileService` constructor pass a `SharedFileServiceType` enum value. A `str` value is never passed.
Yes - you are right. It has to do with what calls it, not the underlying settings. Updated for both.
machine/jobs/thot/thot_word_alignment_model_factory.py
line 18 at r2 (raw file):
Now that I think about it, I don't think we need a new model template. You can just remove the `init` method altogether.
Done.
Add an SMT alignment job This lays the groundwork for https://github.com/sillsdev/serval/issues/410.
This change is