tatuylonen / wiktextract

Wiktionary dump file parser and multilingual data extractor
Other
799 stars 82 forks source link

[en] move some en edition code to "extractor/en" folder #829

Closed xxyzz closed 1 week ago

xxyzz commented 2 weeks ago

Mostly boring imports changes, non-en code now load much faster(reduce about 3.5s).

xxyzz commented 1 week ago

I have moved "tags.py" and "topics.py" back to the "src/wiktextract" folder.

IMO translate topic is more trickier then tags, some information will be lost after the translation and it may not be used in other editions.

kristian-clausal commented 1 week ago

We can move stuff around later. :+1:

kristian-clausal commented 1 week ago

Tatu said that valid_tags and other such should be kept for validation of tags between editions. All editions should, as far as we can make it possible, have the same standard tags. tags.py has some stuff that belongs at top level (valid_tags) and some that don't (all the maps of English strings -> tags).