sillsdev / machine.py

Machine is a natural language processing library for Python that is focused on providing tools for processing resource-poor languages.
MIT License
10 stars 2 forks source link

Update machine.py to reflect usfm changes in machine #103

Closed mshannon-sil closed 7 months ago

mshannon-sil commented 7 months ago

Based on the changes in Machine from https://github.com/sillsdev/machine/pull/160, I've updated machine.py's handling of USFM accordingly.

Main changes:

  1. Files in machine/corpora were either created or modified to align with new files/changes in Machine
  2. Versification.load() in machine/scripture/verse_ref.py was updated to accept a stream as input, like it can in Machine
  3. Test files were created or modified to reflect new test files/changes in Machine
  4. I added a new test case test_utf_16_encoding_stream() in tests/scripture/test_versification.py to test code I added which handles Unicode errors for loading versification from a stream.

This change is Reviewable

mshannon-sil commented 7 months ago

This PR is porting over the changes from https://github.com/sillsdev/machine/pull/163 as well now.

codecov-commenter commented 7 months ago

Codecov Report

Attention: Patch coverage is 92.22462% with 36 lines in your changes are missing coverage. Please review.

Project coverage is 88.33%. Comparing base (61d49d8) to head (4e6ccde).

Files Patch % Lines
machine/corpora/usfm_parser.py 50.00% 11 Missing :warning:
...rpora/zip_paratext_project_settings_parser_base.py 66.66% 6 Missing :warning:
...e/corpora/paratext_project_settings_parser_base.py 91.80% 5 Missing :warning:
...ne/corpora/zip_paratext_project_settings_parser.py 72.22% 5 Missing :warning:
machine/corpora/usfm_verse_text_updater.py 96.39% 4 Missing :warning:
machine/corpora/corpora_utils.py 66.66% 1 Missing :warning:
machine/corpora/dbl_bundle_text_corpus.py 66.66% 1 Missing :warning:
...e/corpora/file_paratext_project_settings_parser.py 94.11% 1 Missing :warning:
machine/corpora/paratext_project_settings.py 97.29% 1 Missing :warning:
machine/corpora/usfm_parser_state.py 85.71% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #103 +/- ## ========================================== + Coverage 87.62% 88.33% +0.71% ========================================== Files 226 234 +8 Lines 13520 13816 +296 ========================================== + Hits 11847 12205 +358 + Misses 1673 1611 -62 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.