Convert multiple consecutive spaces to a single space
Correct inconsistent spacing around punctuation
The main parts of the logic are stubbed out so when you run the script it doesn't do much:
$ python -m silnlp.common.normalize_extracts /tmp/xri --output hackery/normalized
2024-09-19 17:08:17,470 - silnlp.common.normalize_extracts - INFO - Starting script
2024-09-19 17:08:17,470 - silnlp.common.normalize_extracts - INFO - Output dir set to hackery/normalized
2024-09-19 17:08:17,470 - silnlp.common.normalize_extracts - INFO - Found 0 files to normalize
2024-09-19 17:08:17,470 - silnlp.common.normalize_extracts - INFO - Completed script
The aim of the PR is to get verification from Damien and Michael that the script interface and general approach makes sense as there hasn't been any discussions on those specific details. Once that is sorted, I'll fill in the other parts in follow up PR's, hence the stubbing.
This PR adds an initial stubbed out script to normalize extract files.
It is the first step to implement the second half of this comment: https://github.com/sillsdev/silnlp/issues/494#issuecomment-2345328525
The main parts of the logic are stubbed out so when you run the script it doesn't do much:
The aim of the PR is to get verification from Damien and Michael that the script interface and general approach makes sense as there hasn't been any discussions on those specific details. Once that is sorted, I'll fill in the other parts in follow up PR's, hence the stubbing.
This change is