Open johnml1135 opened 1 year ago
@mshannon-sil - what is the status of this issue?
Texts that it is trained on (number of parallel lines) Duration of the build
Status is still ready, but not in progress yet. This is my next machine.py issue to look at when I get some time while implementing multi-gpu training/inference for SILNLP.
@mshannon-sil - what is the status of this issue?
Not in progress yet. Based on the last standup meeting where we discussed this, I had put this issue in my backlog while I tackled other issues that needed attention. Now that I've wrapped up the preprocessing stats work I needed to do for SILNLP, I should be able to take a look at this.
Also, add a Paratext source summary from machine dotnet. Combine with the machine.py summary.
Before this is implemented, get the requirements from Bethany (wherever they are stored) and link them here.
We should also include versions of the files used in Corpora - to have a full definition of what is there.
First implementation:
Add to the build in a "summary" endpoint that can take arbitrary JSON, just like the "options" can.
This should find it's way all the way back to the Serval build status "message" (or, by contrast, make it a separate field?)
The initial imputeus is to capture the actual language codes used and whether they are in the NLLB-200 or not. Timing or other issues could also be reported as could be helpful.