unicode-org / unilex

Lexical data at Unicode
Other
66 stars 16 forks source link

Run process again to include missing files #12

Open hugolpz opened 3 years ago

hugolpz commented 3 years ago

There is a crawl_ca-valencia.py within the google/corpuscrawler projects. Which produces a file visible on their readme.md . Surprisingly, this frequency file didn't make it to UNILEX. As renowed Twitter expert on Catalan language Unjoanqualsevol puts it:

Screenshot_2021-02-26_07-45-56

Great! But I can't find Catalan (ca) language data 😭[Crying emoji]

There is indeed no ca, cat, nor ca-valencia document within UNILEX. A quick search [CTRL+F] for .txt returns the following results : Projects Files
corpuscrawler 1001
UNILEX 999

Q: Is there any plan to rerun the whole chain at any time or periodically ?

hugolpz commented 3 years ago

I proposed a PR #13 .

For maintenance reasons I plan to remove this PR branch in a week.