lau / calendar

date-time and time zone handling in Elixir
MIT License
466 stars 35 forks source link

I can contribute some Asian languages Translations in strftime, but is it really the best way to put them in a library? #19

Closed c0b closed 8 years ago

c0b commented 8 years ago

hi Lau, I've just watched your datetime video, I agree with most of your opinions to maintain datetime in a library, while about your advocate on locale names, just think about translation_module about locale specific monthnames / weekday names again, is that really the best way to maintain them in the library? because these names won't change as often as like the leap seconds, right?

ElixirConf 2015 - Mastering date/time handling with Elixir by Lau Taarnskov https://youtu.be/keUbVvMJeKY?list=PLE7tQUdRKcyZb7L66A9JvYWu_ItURk8qJ via @YouTube

for example, I know Japanese and living in Singapore, by this date command, it reads locale specific names for Japanese, this will use some of the Linux system's tzdata pacakge data,

➸ env -i LANG=ja_JP.utf8 TZ=Asia/Singapore date --date='2 days ago'
2016年  1月  7日 木曜日 17:26:46 SGT

and for Chinese Simplified or Traditional, like these;

➸ env -i LANG=zh_CN.utf8 TZ=Asia/Singapore date --date='2 days ago'
2016年 01月 07日 星期四 17:47:49 SGT
➸ env -i LANG=zh_TW.utf8 TZ=Asia/Singapore date --date='2 days ago'
四  1月  7 17:47:54 SGT 2016

for sure I can make a PR with additional locale names for %A %b %B %c, ... but just think about it again, is it really the best way to write in the library, how can we leverage the system's tzdata ?

https://github.com/lau/calendar/blob/master/lib/calendar/strftime.ex#L268-L279

### from `date --help | grep -i locale`
➸ date --help |grep -i locale
  %a   locale's abbreviated weekday name (e.g., Sun)
  %A   locale's full weekday name (e.g., Sunday)
  %b   locale's abbreviated month name (e.g., Jan)
  %B   locale's full month name (e.g., January)
  %c   locale's date and time (e.g., Thu Mar  3 23:05:25 2005)
  %p   locale's equivalent of either AM or PM; blank if not known
  %r   locale's 12-hour clock time (e.g., 11:11:04 PM)
  %x   locale's date representation (e.g., 12/31/99)
  %X   locale's time representation (e.g., 23:13:48)
E to use the locale's alternate representations if available, or
O to use the locale's alternate numeric symbols if available.

Like:

➸ env -i LANG=ja_JP.utf8 TZ=Asia/Singapore date --date='2 days ago' \
        '+%A %B %c %p %r %x %X'
木曜日 1月 2016年01月07日 17時58分48秒 午後 午後05時58分48秒 2016年01月07日 17時58分48秒
lau commented 8 years ago

Hi. Thanks for offering help. After that talk I have made it so that you can use an outside library. E.g. https://github.com/padde/calendar_translations . You just have to specify the library in config.exs e.g. config :calendar, :translation_module, CalendarTranslations.Translations . I am soon going to put deprecation warnings when trying to use one of the few languages "built in" to Calendar itself except English.

With that package you can do e.g.:

iex> Calendar.DateTime.now_utc |> Calendar.Strftime.strftime!("%A %B %c %p %r", :ja)     
"土曜日 1月 土 1月  9 18:25:35 2016 PM 06:25:35 PM"
iex> Calendar.DateTime.now_utc |> Calendar.Strftime.strftime!("%A %B %c %p %r", :"zh-TW")
"星期六 一月 周六 1月  9 18:25:53 2016 PM 06:25:53 PM"

I just tried running a command you wrote and got

$ env -i LANG=zh_CN.utf8 TZ=Asia/Singapore date --date='2 days ago'
Fri Jan  8 01:38:02 SGT 2016

Maybe one needs to have zh_CN installed? A reason to have a package separate from the OS is that Elixir runs on different OSes. Relying on the date command won't work in e.g. Windows. And it didn't get the same result on the Linux machine I tested on.

I guess it would be possible to make a library that just calls directly to the date command. It would just work on compatible *nix systems if the specified languages were installed.

But what I think is better, is to make Elixir packages that contains the data. like e.g. :calendar_translations. That was made quickly and is waiting for some PRs to be merged. But it would be good with some more features/data, e.g. the different locale time representations such as %x and %X.

c0b commented 8 years ago

I guess it would be possible to make a library that just calls directly to the date command. It would just work on compatible *nix systems if the specified languages were installed.

that's true, specified languages data has to be installed before it can function; while I don't mean to make a library based on the date command, to fork/exec an external command may be too heavy for a library, but instead, the date command itself doesn't have the locale names, it is just to parse those data from tzdata package and installed language-pack data, on a Linux based system, this tzdata package are essential packages which almost every Linux system has installed, language translations are also essential and at least one system language pack is installed;

Since these very base locale names are also in open source somewhere, and what I am thinking about is how to leverage these available locale translation resources and may need to write some elixir code to populate a library for all languages, not through calling date command and rely on installed language packages, like what you did for tzdata; and also not rely on some of our developers to know the langues before we can make for this language, the base Linux system has been translated into 180 languages probably almost all known languages among this world,

➸ dpkg -l |grep tzdata
ii  tzdata        2015g-1        all          time zone and daylight-saving time data
➸ dpkg -L tzdata
/.
/usr
/usr/share
/usr/share/zoneinfo
/usr/share/zoneinfo/MST7MDT
/usr/share/zoneinfo/PRC
/usr/share/zoneinfo/Canada
/usr/share/zoneinfo/Canada/East-Saskatchewan
/usr/share/zoneinfo/Canada/Newfoundland
/usr/share/zoneinfo/Canada/Central
/usr/share/zoneinfo/Canada/Pacific
/usr/share/zoneinfo/Canada/Atlantic
/usr/share/zoneinfo/Canada/Mountain
/usr/share/zoneinfo/Canada/Yukon
/usr/share/zoneinfo/Canada/Eastern
/usr/share/zoneinfo/leap-seconds.list
/usr/share/zoneinfo/EET
/usr/share/zoneinfo/US
/usr/share/zoneinfo/US/Alaska
/usr/share/zoneinfo/US/Indiana-Starke
/usr/share/zoneinfo/US/Arizona
/usr/share/zoneinfo/US/Central
/usr/share/zoneinfo/US/Samoa
/usr/share/zoneinfo/US/Michigan
/usr/share/zoneinfo/US/Hawaii
/usr/share/zoneinfo/US/Pacific
/usr/share/zoneinfo/US/Aleutian
/usr/share/zoneinfo/US/East-Indiana
[...]

➸ dpkg -l |grep language-pack
ii  language-pack-en                      1:15.10+20151016                         all          translation updates for language English
ii  language-pack-en-base                 1:15.10+20151016                         all          translations for language English
ii  language-pack-es                      1:15.10+20151016                         all          translation updates for language Spanish; Castilian
ii  language-pack-es-base                 1:15.10+20151016                         all          translations for language Spanish; Castilian
ii  language-pack-gnome-en                1:15.10+20151016                         all          GNOME translation updates for language English
ii  language-pack-gnome-en-base           1:15.10+20151016                         all          GNOME translations for language English
ii  language-pack-gnome-es                1:15.10+20151016                         all          GNOME translation updates for language Spanish; Castilian
ii  language-pack-gnome-es-base           1:15.10+20151016                         all          GNOME translations for language Spanish; Castilian
ii  language-pack-gnome-ja                1:15.10+20151016                         all          GNOME translation updates for language Japanese
ii  language-pack-gnome-ja-base           1:15.10+20151016                         all          GNOME translations for language Japanese
ii  language-pack-gnome-zh-hans           1:15.10+20151016                         all          GNOME translation updates for language Simplified Chinese
ii  language-pack-gnome-zh-hans-base      1:15.10+20151016                         all          GNOME translations for language Simplified Chinese
ii  language-pack-gnome-zh-hant           1:15.10+20151016                         all          GNOME translation updates for language Traditional Chinese
ii  language-pack-gnome-zh-hant-base      1:15.10+20151016                         all          GNOME translations for language Traditional Chinese
ii  language-pack-ja                      1:15.10+20151016                         all          translation updates for language Japanese
ii  language-pack-ja-base                 1:15.10+20151016                         all          translations for language Japanese
ii  language-pack-zh-hans                 1:15.10+20151016                         all          translation updates for language Simplified Chinese
ii  language-pack-zh-hans-base            1:15.10+20151016                         all          translations for language Simplified Chinese
ii  language-pack-zh-hant                 1:15.10+20151016                         all          translation updates for language Traditional Chinese
ii  language-pack-zh-hant-base            1:15.10+20151016                         all          translations for language Traditional Chinese

➸ ls /usr/share/locale/
aa   ast       br           cs     el           eo     fo     gl   hu     jv   ku
lv   ms   oc   pt_BR  sd   sr@latin  tg   tt@iqtelif   wa ace  az
bs           csb    en           es     fr     gu   hy     ka   kw
mai  mt   om   pt_PT  se   sr@Latn   th   ug           wal af   be
byn          cv     en_AU        es_CL  fr_CA  gv   ia     kab  ky
mg   my   or   qu     shn  st        ti   uk           wo
am   be@latin  ca           cy     en@boldquot  et     frp    ha   id     kk   la
mhr  nb   os   ro     si   sv        tig  ur           xh an   bem       ca@valencia  da
en_CA        eu     fur    haw  ig     kl   lb            mi   nds  pa   ru     sk   sw [...]
➸ ls /usr/share/locale/ |wc -l
180

➸ strace -e open env -i LANG=ja_JP.utf8 TZ=Asia/Singapore date --date='2 days ago'
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/usr/share/zoneinfo/Asia/Singapore", O_RDONLY|O_CLOEXEC) = 3
2016年  1月  8日 金曜日 03:32:29 SGT
+++ exited with 0 +++
lau commented 8 years ago

Yes, it is a good idea to get the data from somewhere else. Preferably the data should be public domain for licensing reasons.

The calendar_translations library is generated from the rails i18n as far as I understand.

It would be nice if we could get the data in a format that could be easily parsed. Then an option is to use macros to compile functions that make the data available in Elixir.

Can you find out if there is a place where this kind of data is available in a format that is easy to parse?

c0b commented 8 years ago

I believe it's somewhere in the base Linux system, although I haven't figured out which base package http://linuxfromscratch.org/lfs/view/development/

c0b commented 8 years ago

hi @lau I just started writing some elixir code, this module seems to be working,

https://github.com/c0b/calendar_translations/blob/master/lib/translations.ex

the metaprogramming looks like powerful

Upstream data is available from glibc source code, there are 327 files under localedata/locales/ directory, should have covered almost all known languages of the world, time format translation data is in each one file between LC_TIME section: https://sourceware.org/git/?p=glibc.git;a=tree;f=localedata/locales https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=localedata/locales/en_US https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=localedata/locales/ja_JP

lau commented 8 years ago

@c0b this is cool.

I have a suggestion: instead of relying on dependencies to download the files, you could make a shell script to download the files from sourceware.org. (see https://github.com/lau/tzdata/blob/v0.1.8/dl_latest_data.sh for an example). These files can then be added to git. And updated once in a while by running the script, if necessary. You can continue to do the metaprogramming, that just read the files. That way you do not have to have internet access to sourceware.org during compilation. And you avoid some dependencies.

It would be cool if we could have every language available. In the beginning we don't necessarily need aliases/shortcuts for the dialects e.g. :fr -> :"fr_FR". :"fr_FR" is fine. There might be people unhappy about choosing e.g. fr_FR over fr_CA as the one for :fr.

@padde look at this. Maybe the two of you could collaborate if you feel like it and be two maintainers of the same package.

padde commented 8 years ago

Hey, nice work! However, I am already secretly working on a package that leverages the Unicode CLDR data. Currently I am exploring the data set to see what can be done before I tie down APIs to something that cannot be extended properly in the future.

The data is provided in XML format, covers a large number of localizations and languages (709 languages and dialects!), and is published under a MIT-ish license which is nice. My current plan is as follows: I already started a CLDR package that just provides the raw data (like tzdata), then a package that will do all sorts of localization like translating currencies, country names, number formatting, concatenating sentences etc. using the data from CLDR. A third package will provide integration with the calendar library, using just the date/time localizations from CLDR. Packages for integrating with other libraries might follow ;-)

For the CLDR package, I have a mix task that downloads the data to priv/. During compile time I parse the XML files and generate some functions that provide access to the raw data, although I think this approach will not scale very well. Fallbacks are also implemented here, e.g. the de_AT data for Austria has just some minor differences to the generic de data, and if a key is missing from de_AT you need to look up the data for the "parent" locale. I am not yet sure how to design the API, there are just so many localizations and variants thereof and I don't just want to put 500 macro-generated functions into a single module. Not sure how publishing this package will work either, because the data weighs in at 12 MB and my first attempt at publishing a package including the data to hex.pm resulted in a "Request entity too large" error. I probably need to find a way to download during compilation time on the user side.

The calendar integration package should not be too much work, I just want to make sure I get the CLDR API right before building things on top of it. The main work is needed for parsing the date/time placeholders from CLDR and converting them into ones that Calendar understands. Maybe the CLDR package can parse the date/time placeholders into a normalized format as well (e.g. a list of strings and atoms, where the atoms are placeholders) and then the calendar integration just needs to use this "AST" to spit out the correct format strings.

The generic localization package is not in the works yet, but I think it can be extremely useful to anyone building a multilingual application.

If you have ideas, feedback or suggestions: all of those are very welcome ;-)

c0b commented 8 years ago

I just changed it to utilized a git clone of glibc, but since glibc is with long development history and even a shallow clone is pretty large, I use below git archive command to checkout its localedata/locales subdirectory only, all plain text files there are just 6.6MB in total https://github.com/c0b/calendar_translations/blob/master/lib/translations.ex#L76-L92

➸ git archive --remote git://sourceware.org/git/glibc.git \
       master localedata/locales | tar -xvv
➸ \du -xsh localedata/locales/
6.6M    localedata/locales/

Then changed to File.stream! and no httpotion is needed,

There might be people unhappy about choosing e.g. fr_FR over fr_CA as the one for :fr.

For French I'm not sure are the datetime format common for both French and Canadian French speakers? the same question is to these 3 lines, is that :fr an alias to :"fr-FR"? https://github.com/padde/calendar_translations/blob/master/lib/calendar_translations/translations.ex#L50-L52 The same problem applies to zh, although I personally like to alias :zh to :zh_TW, but by population :zh_CN is China mainland has much more users than any other :zh variants, so I think to set :zh alias to :zh_CN is more realistic.

I am already secretly working on a package that leverages the Unicode CLDR data. Currently I am exploring the data set to see what can be done before I tie down APIs to something that cannot be extended properly in the future.

I hope you can share something on public git even it may be still premature, but as long as not to register a hex.pm entry so early, we are not promising a stable API, right?

I didn't know this Unicode CLDR data before but MIT license seems to be commercial friendlier than LGPL required by glibc

Do you have more details where (the specific link) did you retrieve those xml files? http://cldr.unicode.org/

I have added a few more languages into my translations.ex file, to test the same parser works for more languages, so far 12 languages generated a 12KB Elixir.Translations.beam file, not sure if that's going to be bloated when all languages added, if that is a concern, maybe we can add some compile time flag, like to categorize all languages by tier of the language speaker population, or by region, ...

  1. if user want a slim beam file, then to compile some popular 20 languages only
  2. if user won't care beam file size, then it's ok to inline all of languages into beam file.
c0b commented 8 years ago

another question: is there a common standard for language abbr code? like :en for all English speakers, :en_US or :"en-US" for US only and :en_GB or :"en-GB" for British ?

I feel GNU software is more intended to use en_US and where did you get :"en-US" ?

lau commented 8 years ago

The dash is from the Rails data I think. Let's keep the GNU way of underscores.

When it comes to files sizes, I would only include the necessary data needed for compilation. So if there is data about currencies etc. that take up a lot of space, that is just wasted space. If necessary/feasible maybe the download script could have another part that trims off unneeded data.

Hex packages are distributed as source files, not beam files. So the size of the data files matter too.

@padde it sounds interesting with all those other things. But for the calendar_translations package it might make sense to just distribute the data needed for that package. And have a separate package doing something else. If I just want translation for the time stuff, maybe I don't want to download and compile all kinds of other things.

c0b commented 8 years ago

while, when I add some more test cases and comparing, this line for csb language failed, does anyone of you know this language, which one is more correct, or close to up to date?

lhs from GNU version, rhs from ruby-i18n

  1) test test some the generated code from glibc localedata (TranslationsTest)
     test/translations_test.exs:6
     Assertion with == failed
     code: Translations.month_names(:csb) == {:ok, ["stëcznik", "gromicznik", "strëmiannik", "łżëkwiôt", "môj", "czerwińc", "lëpińc", "zélnik", "séwnik", "rujan", "lëstopadnik", "gòdnik"]}
     lhs:  {:ok,
            ["stëcznik", "gromicznik", "strumiannik", "łżëkwiôt", "môj", "czerwińc", "lëpinc", "zélnik", "séwnik", "rujan", "lëstopadnik",
             "gòdnik"]}
     rhs:  {:ok,
            ["stëcznik", "gromicznik", "strëmiannik", "łżëkwiôt", "môj", "czerwińc", "lëpińc", "zélnik", "séwnik", "rujan", "lëstopadnik",
             "gòdnik"]}
     stacktrace:
       test/translations_test.exs:26
c0b commented 8 years ago

another problem I noticed, is both databases starting weekday names with Sunday,

https://github.com/svenfuchs/rails-i18n/blob/master/rails/locale/en.yml https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/en_US

but the generated code is starting with Monday, because code generator did that .rotate, is it because Calendar library is expecting that? can we change to starting at Sunday? @lau https://github.com/padde/calendar_translations/blob/master/lib/calendar_translations/translations.ex#L236 https://github.com/padde/calendar_translations/blob/master/extract.rb#L24-L25

c0b commented 8 years ago

similar discrepancy happened on month_name_abbr(:csb)

     lhs:  {:ok, ["stë", "gro", "stm", "łżë", "môj", "cze", "lëp", "zél", "séw", "ruj", "lës", "gòd"]}
     rhs:  {:ok, ["stë", "gro", "str", "łżë", "môj", "cze", "lëp", "zél", "séw", "ruj", "lës", "gòd"]}
c0b commented 8 years ago

the good side is GNU localedata has mentioned Michal Ostrowski as contact and his email, I have sent an email to him to confirm on this, hopefully Michal can respond on this

search engine tells this g+ profile could be him (a Web developer (Ruby, Ruby on Rails, Javascript, Html5...)) or @mostrowski on github

lau commented 8 years ago

another problem I noticed, is both databases starting weekday names with Sunday

Just move the first element to the end of the list when compiling. Internally in this case Calendar follows the ISO standard where Monday is the first day of the week.

c0b commented 8 years ago

follows the ISO standard where Monday is the first day of the week.

to rotate is easy, but where do you see this is a standard? why all above databases starting on Sunday?

I think I have found above Unicode CLDR database mentioned by @padde it's also starting a week on Sunday, why not we change the calendar code to better use the available databases? http://unicode.org/repos/cldr/trunk/common/main/en.xml (search type="sun" in this document) http://unicode.org/repos/cldr/trunk/common/main/ja.xml

c0b commented 8 years ago

how about if we change the program code here? if we use raw database staring on Sunday, there is no need to do the day_of_the_week_off_by_one, do you like if I make a pull request?

https://github.com/lau/calendar/blob/master/lib/calendar/strftime.ex#L194-L199

espresse commented 8 years ago

Hi, I'm responsible for both glibc and rails-18n Kashubian locale. Both spelling variants (strëmiannik / strumiannik) are correct. As for abbreviations I'm not sure why there's "stm" instead of "str" - I think it's a bug/typo in glibc locale.

lau commented 8 years ago

how about if we change the program code here? if we use raw database staring on Sunday, there is no need to do the day_of_the_week_off_by_one, do you like if I make a pull request?

If you get ISO week day number (%u) Sunday is 7. So you would still need to do something for some cases. Calendar can handle both weeks starting on Monday and Sunday for users of the library. But internally you have to choose one of the other. It is just a decision to keep it standard internally.

The ISO week starts on Monday: https://en.wikipedia.org/wiki/ISO_week_date

@espresse That's interesting.

c0b commented 8 years ago

The ISO week starts on Monday: https://en.wikipedia.org/wiki/ISO_week_date thanks, something interesting learned,

What I am saying is, since now we figured out there are 3 databases, all starting with Sunday,

how about if I change the calendar code to read the databases AS IS, with a call like |> rem(7); then there is no need to call rotate in ruby extractor code? https://github.com/padde/calendar_translations/blob/master/extract.rb#L24-L25

I am comparing other implementations from glibc and from python,

  1. http://man7.org/linux/man-pages/man3/strftime.3.html
  2. https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
  3. https://docs.python.org/2/library/datetime.html#datetime.date.isocalendar
  4. http://erlang.org/doc/man/calendar.html#iso_week_number-1

some preliminary conclusions:

  1. glibc version (called by the date command) has richest strftime flags;
  2. python strftime doesn't have %u while it has %U
  3. this elixir version of strftime doesn't have %U %W week of the year (starting on Sunday or Monday); while it has %V for iso_week_number, which is from erlang :calendar.iso_week_number

python has a convenient function isocalendar,

>>> datetime.date(2016,1,2).isocalendar()
(2015, 53, 6)
>>> datetime.date(2016,1,3).isocalendar()
(2015, 53, 7)
>>> datetime.date(2016,1,4).isocalendar()
(2016, 1, 1)

do this in Erlang is like:

iex(21)> isocalendar = fn (date) ->
...(21)>   {year,week} = :calendar.iso_week_number date
...(21)>   day = :calendar.day_of_the_week date
...(21)>   {year, week, day}
...(21)> end
#Function<6.54118792/1 in :erl_eval.expr/5>
iex(36)> isocalendar.({2015,12,31})
{2015, 53, 4}
iex(37)> isocalendar.({2016,1,1})  
{2015, 53, 5}
iex(38)> isocalendar.({2016,1,2})
{2015, 53, 6}
iex(39)> isocalendar.({2016,1,3})
{2015, 53, 7}
iex(40)> isocalendar.({2016,1,4})
{2016, 1, 1}
➸ ncal -3 -w
    December 2015     January 2016      February 2016     
Su     6 13 20 27        3 10 17 24 31     7 14 21 28   
Mo     7 14 21 28        4 11 18 25     1  8 15 22 29   
Tu  1  8 15 22 29        5 12 19 26     2  9 16 23      
We  2  9 16 23 30        6 13 20 27     3 10 17 24      
Th  3 10 17 24 31        7 14 21 28     4 11 18 25      
Fr  4 11 18 25        1  8 15 22 29     5 12 19 26      
Sa  5 12 19 26        2  9 16 23 30     6 13 20 27      
   49 50 51 52  1     1  2  3  4  5  6  6  7  8  9 10   

this date pads supported by glibc strftime (and GNU coreutils like date), maybe something interesting to add?

By default, date pads numeric fields with zeroes.
The following optional flags may follow '%':

  -  (hyphen) do not pad the field
  _  (underscore) pad with spaces
  0  (zero) pad with zeros
  ^  use upper case if possible
  #  use opposite case if possible
➸ for sec in {1452153600..1451203200..86400}
do date \
  +'"%x" : "isocalendar: (%G,%V,%u), calendar: (%Y,%-W,%w), iso: (%Y,%_U,%u)"' \
  --date @$sec
done
"01/07/2016" : "isocalendar: (2016,01,4), calendar: (2016,1,4), iso: (2016, 1,4)"
"01/06/2016" : "isocalendar: (2016,01,3), calendar: (2016,1,3), iso: (2016, 1,3)"
"01/05/2016" : "isocalendar: (2016,01,2), calendar: (2016,1,2), iso: (2016, 1,2)"
"01/04/2016" : "isocalendar: (2016,01,1), calendar: (2016,1,1), iso: (2016, 1,1)"
"01/03/2016" : "isocalendar: (2015,53,7), calendar: (2016,0,0), iso: (2016, 1,7)"
"01/02/2016" : "isocalendar: (2015,53,6), calendar: (2016,0,6), iso: (2016, 0,6)"
"01/01/2016" : "isocalendar: (2015,53,5), calendar: (2016,0,5), iso: (2016, 0,5)"
"12/31/2015" : "isocalendar: (2015,53,4), calendar: (2015,52,4), iso: (2015,52,4)"
"12/30/2015" : "isocalendar: (2015,53,3), calendar: (2015,52,3), iso: (2015,52,3)"
"12/29/2015" : "isocalendar: (2015,53,2), calendar: (2015,52,2), iso: (2015,52,2)"
"12/28/2015" : "isocalendar: (2015,53,1), calendar: (2015,52,1), iso: (2015,52,1)"
"12/27/2015" : "isocalendar: (2015,52,7), calendar: (2015,51,0), iso: (2015,52,7)"

some more interesting in date is the ability to parse a datetime from a string, including relative times like this command I can get what's local time of 9:00 next Fri in Los_Angeles, would be useful suppose if there is an important sport event in Los Angeles that I want to track on TV live in Singapore localtime;

do we have a strptime interface? I see the parse_util module, haven't looked into details

Show the local time for 9AM next Friday on the west coast of the US
  $ date --date='TZ="America/Los_Angeles" 09:00 next Fri'

➸ env -i TZ=Asia/Singapore date -R --date='TZ="America/Los_Angeles" 09:00 next Fri'
Sat, 23 Jan 2016 01:00:00 +0800
lau commented 8 years ago

how about if I change the calendar code to read the databases AS IS

Calendar is not supposed to read those databases, the translation module is supposed to do that. The translations module needs to provide the data for calendar with Monday being the first element. Because that is the ISO standard. I do not want to change Calendar to accommodate some non standard version when it is very easy to simply move Sunday to the end of the list.

c0b commented 8 years ago

Calendar is not supposed to read those databases

that's how most of the databases designed and their interpretation code expecting database in that way, I mean Calendar code has a design flaw and it can be changed to read database AS IS in the same way and there will be no user impact, I am still learning elixir-lang, trying to find something worth to contribute such change in a PR, however if you don't appreciate such change, that's totally fine this is your place, options for me are either to maintain a fork or won't care. I don't see @padde published any new code he was talking about 10 days back, that I have no where to pick up. I'm going to close this issue soon.

@espresse I don't see there is any changes after its last change in 2013 will that be fixed anytime soon? https://sourceware.org/git/?p=glibc.git;a=history;f=localedata/locales/csb_PL

lau commented 8 years ago

Hi c0B

I mean Calendar code has a design flaw and it can be changed to read database AS IS in the same way and there will be no user impact

If I change the internal interface away from the ISO standard, it would break compatibility with the current version of the calendar-translations package.

I mean Calendar code has a design flaw

You mean it is a design flaw to follow the ISO standard?

This is how I see it. Keep the current interface for translation modules: Pros:

Cons:

We both agree that the con is trivial to solve in code, right? It is trivial to move an element from one end to the other.

I am still learning elixir-lang, trying to find something worth to contribute such change in a PR, however if you don't appreciate such change, that's totally fine this is your place, options for me are either to maintain a fork or won't care.

I would appreciate and I am sure many people would appreciate a translation package based on the Linux data.

padde commented 8 years ago

Having sunday as the first day of the week is a very US centric approach, I'm on @lau's side here and think following standards is a better practice.

Regarding locale string format I prefer the dashed version because it seems to be widely adopted and is also recommended by the W3C, eg for use in the Accept-Language HTTP header. I am proposing to switch from atoms to strings as well because the locale might be based on user input and I don't want to encourage users of any library to build atoms from user supplied strings. Such practice might lead to memory leaks as atoms are never GC'd, and users need to be aware that they must use String.to_existing_atom in order to prevent such vulnerabilities.

I am still working on a CLDR based version, but it is hard to get it right. There will be something visible soon, but I can't make any promises right now.

lau commented 8 years ago

@padde I have changed my mind and agree that the dashes are good, because they are used on the web.

Using strings is probably also a good idea. I have thought about that too. There should probably be a function to validate whether a language code is present/valid. And/or simply a list of valid language codes.

padde commented 8 years ago

@lau most of this is already baked into my upcoming CLDR client ;-)