pdrhlik / sweary

R package that collects swear words from different languages.
MIT License
19 stars 8 forks source link

Rename language files to {LANG_CODE}-{LANG_NAME} #21

Closed pdrhlik closed 5 years ago

pdrhlik commented 6 years ago

Now:

cs
en
pl

Proposal:

cs-Czech
en-English
pl-Polish

We could then parse both the language code and code name. It would help automate the process of building the README files. Right now we need to modify a data frame in there for the language to appear in it.

@MarcinKosinski Agree? Or are there any other possibilities? There would be one more. To create a file with the code-language combinations. But storing a file just for that is a bit stupid if you ask me.

maciejkasinski commented 6 years ago

In case it escalates quickly there's one more option between - a package ISOcodes:

> ISO_639_2 %>% select(name=Name, code=Alpha_2) %>% 
+   filter(code %in% c("en", "cs", "pl", "ro"))
                           name code
1                         Czech   cs
2                       English   en
3                        Polish   pl
4 Romanian; Moldavian; Moldovan   ro
pdrhlik commented 6 years ago

@maciejkasinski I would like to keep the package with as less dependencies as possible. Using ISOcodes seems like a bit of an overkill right now. I'd start to think about it if it starts getting out of our hands.

mczyzj commented 6 years ago

I think using underscore instead of dash would save us problems in long run

pdrhlik commented 6 years ago

We should probably just use one language for a code. I'd say that there is a major language for each of the shortcuts.

For example the wiki page on ISO 639-1 ro code says Romanian is the preferred one. The codes for the Moldovan language are deprecated. I guess this will be the same for other language groups?

pdrhlik commented 5 years ago

Because of #30 (language dialects), language files should look like this: en_English, cs_Czech, fr-CA_French (Canada).

Template: {LANGCODE}{LANG NAME}

LANG_NAME may contain spaces as in fr-CA_French (Canada). Underscore will be used as a separator between code and name. If we encounter any problems with spaces or parenthesis in the language names, we'll make appropriate changes.