roedoejet / g2p

Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!
https://g2p-studio.herokuapp.com
Other
128 stars 27 forks source link

perf: make g2p convert line-oriented when processing a file #358

Closed joanise closed 6 months ago

joanise commented 6 months ago

significant speed up when using it to process long files.

Fixes: #350

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.42%. Comparing base (f50768e) to head (68f89e0).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #358 +/- ## ========================================== + Coverage 93.29% 93.42% +0.13% ========================================== Files 18 18 Lines 2340 2342 +2 Branches 519 520 +1 ========================================== + Hits 2183 2188 +5 + Misses 90 88 -2 + Partials 67 66 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 6 months ago
CLI load time: 0:00.05
PR head 68f89e01fa10509f0eafb1dc67679ea55b440c43
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
joanise commented 6 months ago

Looks like an easy win - you could even do lines = open(input_text, encoding="utf8") instead of reading all the lines at once, although then you have to close the file, so it's probably not worth it.

Yeah, I wanted my context manager to close the file in all exceptional cases, and I didn't want to have to wrap the whole rest of the function in a try/finally block, so I decided for this solution. We can review it if someone complains about memory at some point...