Open PureTryOut opened 1 year ago
I'd suggesting looking at piper instead of Mimic 3. It's where I'm spending my effort these days working for Nabu Casa.
I don't know when I'll have time to come back to gruut, unfortunately.
Strategy on how to tackle test_en.py test_times
test_en.py -- test_times
text = "4:01am and 4:01 p.m."
In text_processor.TextProcessor.process, inline function, in_inline_lexicon
\# Do multiple passes over the graph
num_passes_left = max_passes
while num_passes_left > 0:
...
if detect_times:
if pipeline_transform_window(
self._collapse_time, graph, root, window_size=2
):
was_changed = True
if pipeline_transform(self._transform_time, graph, root):
was_changed = True
...
lang.py
EN_TIME_PATTERN = re.compile(
r"""^((0?[0-9])|(1[0-1])|(1[2-9])|(2[0-3])) # hours
(?::
([0-5][0-9]))? # minutes
\s*(a\.m\.|am|pm|p\.m\.|a\.m|p\.m)? # am/pm
$""",
re.IGNORECASE | re.X,
)
During the while loop Node4 "and " Node5 "4:01 p.m." <-- parent node NOT identified cuz it's not in the while loop Node6 "4:01 p.m" <-- leaf node. Identified correctly as Time. False positive
The issue is Node5 and Node6 are both valid Time according to the regex. Changing the regex does not solve the issue, cuz Node5 is being ignored and Node6 isn't. Rather than fighting, lets just go with the flow and work with the Node we got, Node6.
So to repeat, within while loop, Node4 and Node6 are accessible. Node5 isn't! This is really frustrating. Makes ya wanna shed a tear. So sad.
Suggestion
when the false positive Node6 is identified (correctly) as a Time, have code to look at the parent Node (Node5). If the parent is identified as a valid Time, mark the parent, not the leaf. Then during the next iteration of the while loop, hopefully(TM), Node6 will be ignored.
If during subsequent while loop iterations, Node6 doesn't get ignored, the code fix will run every iteration. Find that the parent Node (Node5) is already marked as a Time and not mark the leaf Node (Node6)
Note
The text_processor navigating the Node tree is not for the faint of heart. It's using itertools recipes. So it's like a puzzle with pieces missing cuz can't inspect an Iterator without affecting the Iterator. Most coders, myself included, are not familiar with itertools. So tracking down the cause is a daunting time consuming task.
As part of packaging this for Alpine Linux to run mimic3, I'm trying to run the test suite of gruut. While some tests succeed, about half of them fail, all of them basically the same way.
More tests fail like this, but it becomes an awful big post if I paste them all :see_no_evil: