scrapinghub / price-parser

Extract price amount and currency symbol from a raw text string
BSD 3-Clause "New" or "Revised" License
313 stars 49 forks source link

Don't parse dates as prices #4

Open kmike opened 5 years ago

kmike commented 5 years ago

Dates like July, 2004 or 15.08.2017 should not be parsed as prices, we should detect them and return amount=None currency=None.

GodSaveTheDucks commented 4 years ago

Can we find a universal date parser lib and filter the matches? If that's a good approach I would like to work on it.

kmike commented 4 years ago

@GodSaveTheDucks there is https://github.com/scrapinghub/dateparser, but I think it is better not to follow this approach, for performance and simplicity reasons. It is not a job of price-parser to classify prices vs dates with a highest possible quality; the idea is to have some additional pre-filter, which is fast & reliable, but likely not complete.

bulatbulat48 commented 4 years ago

@GodSaveTheDucks there is https://github.com/scrapinghub/dateparser, but I think it is better not to follow this approach, for performance and simplicity reasons. It is not a job of price-parser to classify prices vs dates with a highest possible quality; the idea is to have some additional pre-filter, which is fast & reliable, but likely not complete.

@kmike Code review please: https://github.com/scrapinghub/price-parser/pull/19 thanks!

bulatbulat48 commented 4 years ago

@kmike tests fixed. Take a look, please.