juditacs / wikt2dict

Wiktionary parser tool for many language editions.
GNU Lesser General Public License v3.0
53 stars 13 forks source link

Unable to download ta (Tamil) wiktionary #5

Open AshokR opened 7 years ago

AshokR commented 7 years ago

Thanks for sharing this! I installed it successfully and ran: w2d.py download en ta It downloaded the English bz2 file and also created the enwiktionary.txt file. However, Tamil wiktionary was not downloaded. Looks like Tamil is not a supported language. Can you give me some tips how to add it?

AshokR commented 7 years ago

Edited these 4 files: /res/wikicodes /res/langnames/english /res/wiktionaries-full.tsv /wikt2dict/config.py

And added one file: /res/langnames/tamil

It works for Tamil. Thanks again!

juditacs commented 7 years ago

Hi, sorry for the late answer. If it's not too much trouble could you please create a pull request with your modification? I would like to add it to the main repo.

AshokR commented 7 years ago

Thanks for your response. I created a pull request. I am only looking for en -> ta and ta -> en translations. However, I am getting many other languages in the translation_pairs files created. I tried removing all other languages other than en and ta from the english and tamil files in the /res/langnames folder. However, that didn't seem to help. Any tips will be much appreciated. Thanks again!

juditacs commented 7 years ago

Thanks for the PR.

That feature is missing but you can easily extract English-Tamil only pairs on any Unix-like system with awk.

awk 'BEGIN{FS="\t"}{if($1=="en" && $3=="ta")print}' translation_pairs > filtered_pairs