Closed SimonTeixidor closed 2 years ago
Knowing that 3375fe69ba968f13ddca594475de3a8fc01b2c79 is where things took a turn is a huge help, thanks @SimonPersson!
I’ll see if I can piece it together, but maybe @nickwynja, who authored that commit, can help?
Looks like both this and #56 are caused by my changes. I hope to take a look over the next week or two.
Appreciate that @nickwynja! I haven’t had a chance to dig in, so any help is appreciated.
Managed to settle on a more direct, though less elegant, solution much quicker than I thought I'd be able to.
I’m validating that https://github.com/websterParser/WebsterParser/pull/81 did the trick, and if so, I’ll cut a new release for y’all!
Fixed!
I believe that all words following "Roller bearing" from the
CIDE.R
source file are missing from the resulting dictionary. See "Rut", "Ruta-baga", etc.I ran a git bisect, and it appears that the breaking change was introduced in 3375fe69ba968f13ddca594475de3a8fc01b2c79. I tried to read the changes introduced there but I haven't been able to figure the issue out yet.
There's a similar issue for words following "Stooge", such as "Sweet". Here the issue seems to be that the source data is missing a closing
</p>
tag for the "Stooge" entry. I guess this should be reported upstream to GCIDE, but perhaps we could make the parser more robust against things like that?Given that I just stumbled upon some examples, I suspect that there are quite a few words missing. I wonder if we could come up with an automated way to verify that the resulting dictionary contains all words from the source files?