Closed benjamin-hodgson closed 4 months ago
Thanks for the detailed issue (and also for the coffee :))
I'm looking into this further now – the purpose of that regex was to help with mixed numbers that use unicode e.g. "1½", which I think would be broken by your proposed solution. It definitely needs to be made more specific though, working on that atm!
Turns out NFKD normalisation is already converting numbers fully to ASCII – my previous incorrect assumption was that it converted them to superscripts and so it was possible to still disambiguate e.g. 1 1/2 from 11/2 after normalisation. Now, instead, I've removed deUnicodeFractions
entirely and will simply require callers to pre-normalise and for there to be a space in mixed numbers. Going to write some tests and check everything still works.
A possible future fix if I really desperately want mixed numbers without spaces to work (unlikely) would be to add a space around unicode fractions if it doesn't exist prior to unicode normalisation (but then there are a bunch more characters that need to be matched because the normalisation hasn't happened yet)
Some of my recipes feature quantities which look something like this:
These quantities get scaled incorrectly. For example, when scaling by 3:
Seems the
1.5
is getting tokenised as two quantities. As far as I can tell the bug appears to be here;deUnicodeFractions
is returning1 .5 g (1/4 tsp) vanilla paste
. The regex splits the string into1
and.5 g (1⁄4 tsp) vanilla paste
.I think
.+
at the start of the second group is probably too permissive, since we don't expect to see inputs with anything other than numbers on either side of the fraction slash post-normalisation. I tried replacing it with\d+
(to read/(\d+)(\d+\u2044.+)\b/ig
) and it seemed to give me good results for all the inputs I could think of, but would defer to you on the desired behaviour.Example with the proposed fix (after scaling by 3):
Thanks!