open-dict-data / ipa-dict

Monolingual wordlists with pronunciation information in IPA
https://open-dict-data.github.io/ipa-lookup/
MIT License
555 stars 86 forks source link

Create pt_BR.txt #19

Closed carmo-evan closed 2 years ago

dohliam commented 3 years ago

@carmo-evan Wow, this is fantastic -- thanks so much! :tada:

Would it be possible to convert the entry headwords to regular/lower case rather than uppercase? In other words, they should all be lowercase other than proper nouns and other words that would normally be capitalized. One way to do this could be to compare the list of headwords with an existing dictionary file -- the Aspell dictionary for Brazilian Portuguese for example has about 310K entries so it might be sufficient for this purpose, but maybe there is an easier/more accurate way to do this on your end?

carmo-evan commented 3 years ago

Hi @dohliam !

Let me try to come up with a script and I'll let you know.

Thank you,

Evan

dohliam commented 3 years ago

@carmo-evan Awesome! Thanks again. Let me know if there's anything I can help with :smile:

dohliam commented 2 years ago

@carmo-evan I'm going to merge this for now as others may be interested in using this data in the meantime. If you (or anyone else) is able to figure out a script to correct the capitalization of proper nouns please feel free to open a separate PR.

Thanks again for providing this! :+1: