scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.51k stars 466 forks source link

Parse Taiwan dates #837

Open manycoding opened 3 years ago

manycoding commented 3 years ago

Hey hey, Taiwan has a special case which is not currently supported in dateparser 0.7.6

dateparser.parse("101/05/25")
dateparser.parse("101/05/25 08:36 92694")
dateparser.parse("101/05/25 08:36 92694", languages=["twq"])
dateparser.parse("101/05/25 08:36 92694", languages=["zh-Hant"])

Would be nice to have it

noviluni commented 3 years ago

Hi @manycoding it would be a good idea if you can provide the results of those examples or where they were found.

I investigated a little and I think this is called the "Taiwanese Calendar R.O.C. Era", and can be calculated by doing Republican calendar Year = Western calendar Year - 1911, so year 101 means 2012.

we could try to implement this, but I'm not sure how could we know if 101/05/25 refers to year 101 or year 2012... Maybe is something that should be handled by creating another "Calendar" as we do with the Jalali or Hijri calendars...

I will try to think about it, if you have any other ideas or examples they are welcomed. :)

manycoding commented 3 years ago

These are coming from dates on some Taiwanese receipts. We know for sure that their year has 3 digits, at least for the next 900 years, so I think it should be enough to recognise it. And <100 (<2010 year) probably is not that important.