Build summary - for debugging/diagnostics

sillsdev / machine.py

Machine is a natural language processing library for Python that is focused on providing tools for processing resource-poor languages.

MIT License

10 stars 2 forks source link

Build summary - for debugging/diagnostics #53

Open johnml1135 opened 1 year ago

johnml1135 commented 1 year ago

This should find it's way all the way back to the Serval build status "message" (or, by contrast, make it a separate field?)

The initial imputeus is to capture the actual language codes used and whether they are in the NLLB-200 or not. Timing or other issues could also be reported as could be helpful.

johnml1135 commented 11 months ago

@mshannon-sil - what is the status of this issue?

johnml1135 commented 11 months ago

Texts that it is trained on (number of parallel lines) Duration of the build

mshannon-sil commented 11 months ago

Status is still ready, but not in progress yet. This is my next machine.py issue to look at when I get some time while implementing multi-gpu training/inference for SILNLP.

johnml1135 commented 10 months ago

@mshannon-sil - what is the status of this issue?

mshannon-sil commented 10 months ago

Not in progress yet. Based on the last standup meeting where we discussed this, I had put this issue in my backlog while I tackled other issues that needed attention. Now that I've wrapped up the preprocessing stats work I needed to do for SILNLP, I should be able to take a look at this.

johnml1135 commented 9 months ago

Also, add a Paratext source summary from machine dotnet. Combine with the machine.py summary.

johnml1135 commented 9 months ago

Before this is implemented, get the requirements from Bethany (wherever they are stored) and link them here.

johnml1135 commented 9 months ago

We should also include versions of the files used in Corpora - to have a full definition of what is there.

johnml1135 commented 1 month ago

First implementation:

Number of lines that are being trained on
Number of lines that are being pretranslated
Total time for build

Add to the build in a "summary" endpoint that can take arbitrary JSON, just like the "options" can.