Open jackmcdade opened 6 years ago
Could you provide a bit more information here? How are you using this library? Some more examples of things which do vs don't work? I haven't touched this for a long time, so don't have much context.
It does look like your third example won't work (we're only matching a single digit on either side of the slash), but I'd be surprised if "1/3 cup flour" wasn't working, since 1/3 appears so frequently in our training data.
We're using it inside a PHP application as an API, but even just using the included nyt-ingredients-snapshot-2015.csv
data and basic CLI instructions from the README we get the same behavior. 14/15 is obviously not something you'd ever encounter in a recipe, but just trying to push the edges of what's actually happening under the hood here.
For example, here's the tagged result of 1/3 cup milk
given the basic training model.
# 0.951035
1/3 I1 L4 NoCAP NoPAREN OTHER/0.998681
cup I2 L4 NoCAP NoPAREN B-UNIT/0.956263
milk I3 L4 NoCAP NoPAREN B-NAME/0.994245
1/3 is being tagged as OTHER
, while 1/2
and 1/4
work just fine.
I too would have assumed it wouldn't be an issue on this side, and spent a large amount of time ruling out every other possibility, retaining it with many different subsets of our user-submitted data with no luck. I finally decided to start from the ground up here and noticed that your dataset behaves exactly the same.
Definitely surprised. I'm hoping you have even the slightest idea of what's going on. 🙏
Have you guys run into issues trying to parse strings with fractions that correlate to irrational numbers? For example, the following will all return
null
forqty
.