sillsdev / machine

Machine is a natural language processing library for .NET that is focused on providing tools for processing resource-poor languages.
MIT License
26 stars 15 forks source link

Smt on clearml #200

Closed johnml1135 closed 3 months ago

johnml1135 commented 4 months ago

There are a still a bug or two to work out - but it should be good enough to review. There will likely be more churn in review at this point than from bugs.


This change is Reviewable

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 69.68026% with 275 lines in your changes missing coverage. Please review.

Project coverage is 67.28%. Comparing base (bf9bb16) to head (960ee4b).

Files Patch % Lines
...chine.AspNetCore/Services/ClearMLMonitorService.cs 15.62% 54 Missing :warning:
...Machine.AspNetCore/Services/ThotSmtModelFactory.cs 0.00% 38 Missing :warning:
...ne.AspNetCore/Services/SmtTransferTrainBuildJob.cs 77.91% 32 Missing and 4 partials :warning:
...hine.AspNetCore/Utils/DictionaryStringConverter.cs 0.00% 36 Missing :warning:
...NetCore/Configuration/IMachineBuilderExtensions.cs 0.00% 34 Missing :warning:
....Machine.AspNetCore/Services/PreprocessBuildJob.cs 92.16% 19 Missing and 7 partials :warning:
....Machine.AspNetCore/Services/ClearMLHealthCheck.cs 0.00% 13 Missing :warning:
...ine.AspNetCore/Services/UnigramTruecaserFactory.cs 0.00% 7 Missing :warning:
...c/SIL.Machine.AspNetCore/Services/S3WriteStream.cs 0.00% 6 Missing :warning:
...Core/Services/SmtTransferClearMLBuildJobFactory.cs 83.33% 3 Missing and 2 partials :warning:
... and 9 more
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #200 +/- ## ========================================== + Coverage 67.22% 67.28% +0.05% ========================================== Files 441 445 +4 Lines 35281 35415 +134 Branches 4719 4736 +17 ========================================== + Hits 23718 23828 +110 - Misses 10471 10496 +25 + Partials 1092 1091 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Configuration/ClearMLBuildJobOptions.cs line 3 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I would rename this. The `Options` suffix usually indicates that it is a config model that can be injected into a service. Maybe `ClearMLBuildQueue`.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Configuration/ClearMLBuildJobOptions.cs line 5 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I don't think this key is necessary.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Configuration/ClearMLEngineTypeOptions.cs line 3 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
Is this class used anywhere?

Nice catch - was made by accident.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 361 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
This method should support with and without the `Configuration` property being set, just like it did before.

I believe this is the only change needed.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Models/Build.cs line 11 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I would rename this to `BuildJobRunnerType` to make it consistent with the other enums.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs line 8 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
A default value isn't necessary.

Ok - especially if we will delete all Machine repo entries.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/BuildJobService.cs line 81 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
You should update the preceding `ExistsAsync` call to `GetAsync`, so you don't have to hit the database twice.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs line 11 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
It would probably be easier just to make this a hash set.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs line 28 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I would return a `IReadOnlySet` and check the `Count`. LINQ functions can often be inefficient if the underlying type is a collection.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 34 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I would add a separate interface to expose the queue sizes. Maybe something like `IClearMLQueueService`. The dictionary could have issues with race conditions. I would use `ConcurrentDictionary` and just expose the queue size using a method `GetQueueSize`.

I think this is what you are looking for.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 66 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
The `ToList()` isn't necessary.

It did need it - but I refactored to be a little clearer. What do you think?

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 74 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
The `ToList()` isn't necessary.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/PostprocessBuildJob.cs line 14 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I would just make this a protected property. You could also pass in the value for this to the constructor, so it doesn't have to be overridden.

Ok - removed the primary constructors.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs line 13 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I would make these protected properties. You could also pass in the values for this to the constructor, so they don't have to be overridden.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs line 85 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I think it would be clearer to pass `JobRunnerType.ClearML`.

This can be overridden to do hangfire for SMT for testing.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs line 31 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
All of the following should happen in single lock: - The model updated on disk. - The model trained on new segments. - The local state updated. - The Serval state updated.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs line 80 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
The `SmtTransferEngineStateService` was designed for the live inferencing in the engine server. I wouldn't use it here. You can just load the model directly and train the segments.

How do I load it directly?

johnml1135 commented 3 months ago

tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs line 126 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
What is the delay for? This usually indicates that the test might not be deterministic and therefore flaky.

It was on windows...

johnml1135 commented 3 months ago

tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs line 274 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
It is best to avoid hitting the file system in a unit test. What is this for?

I believe we could not use a memory stream - I could be wrong...

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 361 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
This should be what we want: ``` public static IMachineBuilder AddBuildJobService(this IMachineBuilder builder, string? smtTransferEngineDir = null) { builder.Services.AddScoped(); builder.Services.AddScoped(); builder.Services.AddScoped(); builder.Services.AddScoped(); builder.Services.AddSingleton(); builder.Services.AddHostedService(p => p.GetRequiredService()); builder.Services.AddScoped(); builder.Services.AddScoped(); builder.Services.AddScoped(); if (smtTransferEngineDir is null) { var smtTransferEngineOptions = new SmtTransferEngineOptions(); builder.Configuration?.GetSection(SmtTransferEngineOptions.Key).Bind(smtTransferEngineOptions); smtTransferEngineDir = smtTransferEngineOptions.EnginesDir; } string? driveLetter = Path.GetPathRoot(smtTransferEngineDir)?[..1]; if (driveLetter is null) throw new InvalidOperationException("SMT Engine directory is required"); // add health check for disk storage capacity builder .Services.AddHealthChecks() .AddDiskStorageHealthCheck( x => x.AddDrive(driveLetter, 1_000), // 1GB "SMT Engine Storage Capacity", HealthStatus.Degraded ); return builder; } ```

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs line 82 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
If you change the type to `HashSet`, then you don't need to convert to an `ImmutableHashSet` before returning. `HashSet` implements the `IReadOnlySet` interface.

explicit cast...

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 34 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I suggested exposing `GetQueueSize` instead of the dictionary, because we don't want to expose a mutable dictionary. `GetQueueSize` is also a simpler API.

Makes sense - that would be a more standard interface.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 66 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
It isn't safe to iterate over a collection while mutating it. I think the following would work: ``` foreach (KeyValuePair kvp in tasksPerEngineType) tasks.TryAdd(kvp.Key, kvp.Value); ```

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs line 15 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
The `Distinct` call is unnecessary.

Hashset already makes it distinct...

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs line 29 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
It would probably be good to rename these variables to indicate that these are for NMT models. Maybe `validNmtModelFilenames`.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs line 38 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
I don't think this is being used anywhere.

Fixed.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/NmtEngineService.cs line 172 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
This method doesn't need to be async anymore.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs line 7 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
If this is used only for unit testing, then it should be an internal property.

I think these changes will work.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs line 13 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
These properties aren't used outside of this class and the classes that inherit from it, so the properties don't need to be public.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs line 181 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
We should skip over this loop if pretranslations aren't enabled.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs line 54 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
File system access should be abstracted behind an interface to make it easier to write unit tests. In this case, the `ISmtModelFactory` interface would probably be appropriate. It is the interface that is currently used to abstract file access to SMT models.

I don't fully understand how I would do this.

johnml1135 commented 3 months ago

tests/SIL.Machine.AspNetCore.Tests/Services/NmtEngineServiceTests.cs line 198 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
You forgot to rename this property.

Done.

johnml1135 commented 3 months ago

tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs line 356 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
You forgot to rename this property.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs line 80 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
You can look at the original `SmtTransferBuildJob` class to get an idea.

I think this is what you are looking for.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs line 54 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…
I don't fully understand how I would do this.

Done

johnml1135 commented 3 months ago

tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs line 126 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
You should be able to figure out how to get the test to work without the delay.

Done.

johnml1135 commented 3 months ago

tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs line 274 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
See my [comment](https://reviewable.io/reviews/sillsdev/machine/200#-NyR7w1V7_SQtpBVow8w) above.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs line 82 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
You should avoid explicit casting as much as possible. The cast isn't necessary if you change the type to `HashSet`.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs line 39 at r4 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
The variable should be `validSmtModels`. For acronyms longer than 2 characters, only the first character is capitalized.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs line 7 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
The standard practice when you expose the implementation of a class for testing purposes is to make the property/method/field `internal`. This clearly indicates that it isn't part of the public API and should not be used by normal callers.

Ah, the things I don't know about C#... how can I count they ways, as long as the count property is public because I am not part of the assembly.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs line 202 at r5 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
You don't need to flush the writer. It will be flushed when it is disposed.

It was flushed with Write, not WriteAsync. With the now implemented Write, it should be ok. Removed.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/SmtTransferPreprocessBuildJob.cs line 18 at r6 (raw file):

        EngineType = TranslationEngineType.SmtTransfer;
        PretranslationEnabled = false;
        TrainJobRunnerType = BuildJobRunnerType.Hangfire;

This is hard coded - it should be configurable through either build options (probably not) or environment variables. Right now, with the python SMT so slow, the E2E tests will time out. We need to either speed up the python code or run the SMT in hangfire for E2E testing.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLBuildJobRunner.cs line 13 at r6 (raw file):

        buildJobFactories.ToDictionary(f => f.EngineType);

    private readonly Dictionary<TranslationEngineType, ClearMLBuildQueue> _options =

An error when running E2E tests on docker-compose:

machine-engine-cntr  |       System.ArgumentException: An item with the same key has already been added. Key: SmtTransfer
machine-engine-cntr  |          at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
machine-engine-cntr  |          at System.Linq.Enumerable.ToDictionary[TSource,TKey](List`1 source, Func`2 keySelector, IEqualityComparer`1 comparer)
machine-engine-cntr  |          at SIL.Machine.AspNetCore.Services.ClearMLBuildJobRunner..ctor(IClearMLService clearMLService, IEnumerable`1 buildJobFactories, IOptionsMonitor`1 options) in /home/johnml1135/repos/machine/src/SIL.Machine.AspNetCore/Services/ClearMLBuildJobRunner.cs:line 13
machine-engine-cntr  |          at ResolveService(ILEmitResolverBuilderRuntimeContext, ServiceProviderEngineScope)
machine-engine-cntr  |          at ResolveService(ILEmitResolverBuilderRuntimeContext, ServiceProviderEngineScope)
machine-engine-cntr  |          at ResolveService(ILEmitResolverBuilderRuntimeContext, ServiceProviderEngineScope)
machine-engine-cntr  |          at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider provider, Type serviceType)
machine-engine-cntr  |          at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[T](IServiceProvider provider)
johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs line 8 at r6 (raw file):

    public int Revision { get; set; } = 1;
    public required string EngineId { get; init; }
    public required TranslationEngineType Type { get; init; }

This may break because of database migration. Current plan:

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Models/ClearMLTask.cs line 37 at r9 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
This should be moved to the `SIL.Machine.AspNetCore.Utils` namespace.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 164 at r9 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
We need to test that the train corpus size and confidence are correctly set in the Serval build.

Added to E2E test to verify corpus size and confidence.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 1 at r9 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
This should be moved to the global usings.

Done.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs line 25 at r9 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
We need to figure out where to save the model on the S3 bucket. Right now, I believe that Machine.py is saving the model to the `models` directory. I don't know if this is the best place to save the model. The `models` directory is currently being used for downloading models. This is a different use case. Here we simply need to transfer the model from ClearML to the Machine engine. I think it would probably be better to save the model to the `builds` directory, which is what this code currently expects.

I already moved it to the builds directory for both machine and machine.py. I agree - I want the SMT model to auto-delete the same way the other build files auto-delete at the end of the build job.

johnml1135 commented 3 months ago

src/SIL.Machine.AspNetCore/Services/ThotSmtModelFactory.cs line 72 at r9 (raw file):

Previously, ddaspit (Damien Daspit) wrote…
Machine.py is now saving the model as `.tar.gz`. We need to update this code accordingly.

I changed it back to .zip. Dotnet handles .zip easier than tar.gz and python doesn't care.