Closed johnml1135 closed 4 months ago
Attention: Patch coverage is 69.68026%
with 275 lines
in your changes missing coverage. Please review.
Project coverage is 67.28%. Comparing base (
bf9bb16
) to head (960ee4b
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
src/SIL.Machine.AspNetCore/Configuration/ClearMLBuildJobOptions.cs
line 3 at r1 (raw file):
I would rename this. The `Options` suffix usually indicates that it is a config model that can be injected into a service. Maybe `ClearMLBuildQueue`.
Done.
src/SIL.Machine.AspNetCore/Configuration/ClearMLBuildJobOptions.cs
line 5 at r1 (raw file):
I don't think this key is necessary.
Done.
src/SIL.Machine.AspNetCore/Configuration/ClearMLEngineTypeOptions.cs
line 3 at r1 (raw file):
Is this class used anywhere?
Nice catch - was made by accident.
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs
line 361 at r1 (raw file):
This method should support with and without the `Configuration` property being set, just like it did before.
I believe this is the only change needed.
src/SIL.Machine.AspNetCore/Models/Build.cs
line 11 at r1 (raw file):
I would rename this to `BuildJobRunnerType` to make it consistent with the other enums.
Done.
src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs
line 8 at r1 (raw file):
A default value isn't necessary.
Ok - especially if we will delete all Machine repo entries.
src/SIL.Machine.AspNetCore/Services/BuildJobService.cs
line 81 at r1 (raw file):
You should update the preceding `ExistsAsync` call to `GetAsync`, so you don't have to hit the database twice.
Done.
src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs
line 11 at r1 (raw file):
It would probably be easier just to make this a hash set.
Done.
src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs
line 28 at r1 (raw file):
I would return a `IReadOnlySet` and check the `Count`. LINQ functions can often be inefficient if the underlying type is a collection.
Done.
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs
line 34 at r1 (raw file):
I would add a separate interface to expose the queue sizes. Maybe something like `IClearMLQueueService`. The dictionary could have issues with race conditions. I would use `ConcurrentDictionary` and just expose the queue size using a method `GetQueueSize`.
I think this is what you are looking for.
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs
line 66 at r1 (raw file):
The `ToList()` isn't necessary.
It did need it - but I refactored to be a little clearer. What do you think?
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs
line 74 at r1 (raw file):
The `ToList()` isn't necessary.
Done.
src/SIL.Machine.AspNetCore/Services/PostprocessBuildJob.cs
line 14 at r1 (raw file):
I would just make this a protected property. You could also pass in the value for this to the constructor, so it doesn't have to be overridden.
Ok - removed the primary constructors.
src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs
line 13 at r1 (raw file):
I would make these protected properties. You could also pass in the values for this to the constructor, so they don't have to be overridden.
Done.
src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs
line 85 at r1 (raw file):
I think it would be clearer to pass `JobRunnerType.ClearML`.
This can be overridden to do hangfire for SMT for testing.
src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs
line 31 at r1 (raw file):
All of the following should happen in single lock: - The model updated on disk. - The model trained on new segments. - The local state updated. - The Serval state updated.
Done.
src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs
line 80 at r1 (raw file):
The `SmtTransferEngineStateService` was designed for the live inferencing in the engine server. I wouldn't use it here. You can just load the model directly and train the segments.
How do I load it directly?
tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs
line 126 at r1 (raw file):
What is the delay for? This usually indicates that the test might not be deterministic and therefore flaky.
It was on windows...
tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs
line 274 at r1 (raw file):
It is best to avoid hitting the file system in a unit test. What is this for?
I believe we could not use a memory stream - I could be wrong...
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs
line 361 at r1 (raw file):
This should be what we want: ``` public static IMachineBuilder AddBuildJobService(this IMachineBuilder builder, string? smtTransferEngineDir = null) { builder.Services.AddScoped(); builder.Services.AddScoped (); builder.Services.AddScoped (); builder.Services.AddScoped (); builder.Services.AddSingleton (); builder.Services.AddHostedService(p => p.GetRequiredService ()); builder.Services.AddScoped (); builder.Services.AddScoped (); builder.Services.AddScoped (); if (smtTransferEngineDir is null) { var smtTransferEngineOptions = new SmtTransferEngineOptions(); builder.Configuration?.GetSection(SmtTransferEngineOptions.Key).Bind(smtTransferEngineOptions); smtTransferEngineDir = smtTransferEngineOptions.EnginesDir; } string? driveLetter = Path.GetPathRoot(smtTransferEngineDir)?[..1]; if (driveLetter is null) throw new InvalidOperationException("SMT Engine directory is required"); // add health check for disk storage capacity builder .Services.AddHealthChecks() .AddDiskStorageHealthCheck( x => x.AddDrive(driveLetter, 1_000), // 1GB "SMT Engine Storage Capacity", HealthStatus.Degraded ); return builder; } ```
Done.
src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs
line 82 at r3 (raw file):
If you change the type to `HashSet`, then you don't need to convert to an `ImmutableHashSet` before returning. `HashSet` implements the `IReadOnlySet` interface.
explicit cast...
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs
line 34 at r1 (raw file):
I suggested exposing `GetQueueSize` instead of the dictionary, because we don't want to expose a mutable dictionary. `GetQueueSize` is also a simpler API.
Makes sense - that would be a more standard interface.
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs
line 66 at r1 (raw file):
It isn't safe to iterate over a collection while mutating it. I think the following would work: ``` foreach (KeyValuePairkvp in tasksPerEngineType) tasks.TryAdd(kvp.Key, kvp.Value); ```
Done.
src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs
line 15 at r3 (raw file):
The `Distinct` call is unnecessary.
Hashset already makes it distinct...
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 29 at r3 (raw file):
It would probably be good to rename these variables to indicate that these are for NMT models. Maybe `validNmtModelFilenames`.
Done.
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 38 at r3 (raw file):
I don't think this is being used anywhere.
Fixed.
src/SIL.Machine.AspNetCore/Services/NmtEngineService.cs
line 172 at r3 (raw file):
This method doesn't need to be async anymore.
Done.
src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs
line 7 at r3 (raw file):
If this is used only for unit testing, then it should be an internal property.
I think these changes will work.
src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs
line 13 at r1 (raw file):
These properties aren't used outside of this class and the classes that inherit from it, so the properties don't need to be public.
Done.
src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs
line 181 at r3 (raw file):
We should skip over this loop if pretranslations aren't enabled.
Done.
src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs
line 54 at r3 (raw file):
File system access should be abstracted behind an interface to make it easier to write unit tests. In this case, the `ISmtModelFactory` interface would probably be appropriate. It is the interface that is currently used to abstract file access to SMT models.
I don't fully understand how I would do this.
tests/SIL.Machine.AspNetCore.Tests/Services/NmtEngineServiceTests.cs
line 198 at r3 (raw file):
You forgot to rename this property.
Done.
tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs
line 356 at r3 (raw file):
You forgot to rename this property.
Done.
src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs
line 80 at r1 (raw file):
You can look at the original `SmtTransferBuildJob` class to get an idea.
I think this is what you are looking for.
src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs
line 54 at r3 (raw file):
I don't fully understand how I would do this.
Done
tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs
line 126 at r1 (raw file):
You should be able to figure out how to get the test to work without the delay.
Done.
tests/SIL.Machine.AspNetCore.Tests/Services/SmtTransferEngineServiceTests.cs
line 274 at r1 (raw file):
See my [comment](https://reviewable.io/reviews/sillsdev/machine/200#-NyR7w1V7_SQtpBVow8w) above.
Done.
src/SIL.Machine.AspNetCore/Services/ClearMLHealthCheck.cs
line 82 at r3 (raw file):
You should avoid explicit casting as much as possible. The cast isn't necessary if you change the type to `HashSet`.
Done.
src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs
line 39 at r4 (raw file):
The variable should be `validSmtModels`. For acronyms longer than 2 characters, only the first character is capitalized.
Done.
src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs
line 7 at r3 (raw file):
The standard practice when you expose the implementation of a class for testing purposes is to make the property/method/field `internal`. This clearly indicates that it isn't part of the public API and should not be used by normal callers.
Ah, the things I don't know about C#... how can I count they ways, as long as the count property is public because I am not part of the assembly.
src/SIL.Machine.AspNetCore/Services/PreprocessBuildJob.cs
line 202 at r5 (raw file):
You don't need to flush the writer. It will be flushed when it is disposed.
It was flushed with Write, not WriteAsync. With the now implemented Write, it should be ok. Removed.
src/SIL.Machine.AspNetCore/Services/SmtTransferPreprocessBuildJob.cs
line 18 at r6 (raw file):
EngineType = TranslationEngineType.SmtTransfer; PretranslationEnabled = false; TrainJobRunnerType = BuildJobRunnerType.Hangfire;
This is hard coded - it should be configurable through either build options (probably not) or environment variables. Right now, with the python SMT so slow, the E2E tests will time out. We need to either speed up the python code or run the SMT in hangfire for E2E testing.
src/SIL.Machine.AspNetCore/Services/ClearMLBuildJobRunner.cs
line 13 at r6 (raw file):
buildJobFactories.ToDictionary(f => f.EngineType); private readonly Dictionary<TranslationEngineType, ClearMLBuildQueue> _options =
An error when running E2E tests on docker-compose:
machine-engine-cntr | System.ArgumentException: An item with the same key has already been added. Key: SmtTransfer
machine-engine-cntr | at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
machine-engine-cntr | at System.Linq.Enumerable.ToDictionary[TSource,TKey](List`1 source, Func`2 keySelector, IEqualityComparer`1 comparer)
machine-engine-cntr | at SIL.Machine.AspNetCore.Services.ClearMLBuildJobRunner..ctor(IClearMLService clearMLService, IEnumerable`1 buildJobFactories, IOptionsMonitor`1 options) in /home/johnml1135/repos/machine/src/SIL.Machine.AspNetCore/Services/ClearMLBuildJobRunner.cs:line 13
machine-engine-cntr | at ResolveService(ILEmitResolverBuilderRuntimeContext, ServiceProviderEngineScope)
machine-engine-cntr | at ResolveService(ILEmitResolverBuilderRuntimeContext, ServiceProviderEngineScope)
machine-engine-cntr | at ResolveService(ILEmitResolverBuilderRuntimeContext, ServiceProviderEngineScope)
machine-engine-cntr | at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider provider, Type serviceType)
machine-engine-cntr | at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[T](IServiceProvider provider)
src/SIL.Machine.AspNetCore/Models/TranslationEngine.cs
line 8 at r6 (raw file):
public int Revision { get; set; } = 1; public required string EngineId { get; init; } public required TranslationEngineType Type { get; init; }
This may break because of database migration. Current plan:
src/SIL.Machine.AspNetCore/Models/ClearMLTask.cs
line 37 at r9 (raw file):
This should be moved to the `SIL.Machine.AspNetCore.Utils` namespace.
Done.
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs
line 164 at r9 (raw file):
We need to test that the train corpus size and confidence are correctly set in the Serval build.
Added to E2E test to verify corpus size and confidence.
src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs
line 1 at r9 (raw file):
This should be moved to the global usings.
Done.
src/SIL.Machine.AspNetCore/Services/SmtTransferPostprocessBuildJob.cs
line 25 at r9 (raw file):
We need to figure out where to save the model on the S3 bucket. Right now, I believe that Machine.py is saving the model to the `models` directory. I don't know if this is the best place to save the model. The `models` directory is currently being used for downloading models. This is a different use case. Here we simply need to transfer the model from ClearML to the Machine engine. I think it would probably be better to save the model to the `builds` directory, which is what this code currently expects.
I already moved it to the builds directory for both machine and machine.py. I agree - I want the SMT model to auto-delete the same way the other build files auto-delete at the end of the build job.
src/SIL.Machine.AspNetCore/Services/ThotSmtModelFactory.cs
line 72 at r9 (raw file):
Machine.py is now saving the model as `.tar.gz`. We need to update this code accordingly.
I changed it back to .zip. Dotnet handles .zip easier than tar.gz and python doesn't care.
There are a still a bug or two to work out - but it should be good enough to review. There will likely be more churn in review at this point than from bugs.
This change is