sillsdev / serval

A REST API for natural language processing services
MIT License
4 stars 0 forks source link

Recommend Paratext check and inventories to teams. #280

Open davidbaines opened 9 months ago

davidbaines commented 9 months ago

Paratext includes many basic checks for consistency and USFM structure. The report can be overwhelming though when run with all checks on all books. However some issues in our drafting process may be avoided, or performance of the models may improve if those checks are run and issues fixed.

Often a whole class of issues can be fixed in one go by editing correctly the various inventories such as: Valid Characters, Valid Punctuation, Markers Inventory. The Markers Inventory in particular may help find rare markers that are not required by the project, or markers in the wrong place.

It would be good to find ways to encourage teams to make use of these, especially where there is a clear benefit in terms of model performance.

johnml1135 commented 9 months ago

Could this be part of the onboarding process? We may be able to detect that certain checks have or have not been run and then encourage the team to run them.

davidbaines commented 9 months ago

Yes we could mention this during the onboarding process. If we can run them or not I don't know - we probably don't want to rewrite them, though that wouldn't be terrible. We should focus on those issues that have an impact on the quality of the models and inferences.