scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.51k stars 464 forks source link

Update CLDR data #826

Open noviluni opened 3 years ago

noviluni commented 3 years ago

Once we merge this: https://github.com/scrapinghub/dateparser/pull/825

we will be able to upgrade the CLDR data easily. By doing it we will improve this library as we will be able to add support for more locales, fix old language bugs, etc.

I will try to explain how to do it.

  1. Increase the cldr_version number (in dateparser_scripts.utils.get_raw_data). We should go version by version to avoid too much files to check.
  2. Run python dateparser_scripts/get_cldr_data.py. This will download the new JSON files.
  3. Check the new files. If there are new locales supported (new files or new regions), we should add them to the docs and probably add tests. We can add some languages to the avoid_languages lists or remove some from it if the language is now fully supported.
  4. Run python dateparser_scripts/write_complete_data.py. This will create the new .py files by merging the json files with the yaml files.
  5. Run python dateparser_scripts/order_languages.py to sort them and update the languages_info.py
  6. Run the tests (tox).
  7. If tests fail it could be because some old valid words have been removed from the new files. In this case, we can change/remove the test or add the old words to the yaml files.
  8. Run dateparser_scripts/update_supported_languages_and_locales.py to update the supported locales in the docs.

Next versions to update:

noviluni commented 3 years ago

I can handle the first update to give an example of how to do it. :+1: