mikeedwards / po2json

Pure Javascript implementation of Uniforum message translation. Based on a great gist.
https://gist.github.com/1739769
Other
178 stars 62 forks source link

Different output in some languages when there is semicolon in header #87

Open marusak opened 4 years ago

marusak commented 4 years ago

There seems to be some inconsistency, about using semicolon at the end of plural-forms in .po headers. It's presence can break the format, that po2json produces. Let me explain with examples:

Let's have this file: (tmp.po)

  msgid ""                                                                        
  msgstr ""                                                                       
  "Project-Id-Version: PACKAGE VERSION\n"                                         
  "Language: ko\n"                                                                
  "MIME-Version: 1.0\n"                                                           
  "Content-Type: text/plain; charset=UTF-8\n"                                     
  "Content-Transfer-Encoding: 8bit\n"                                             
  "Plural-Forms: nplurals=1; plural=0\n"                                          
  "X-Generator: Weblate 3.10.1\n"                                                 

  msgid "Combined usage of $0 CPU core"                                           
  msgid_plural "Combined usage of $0 CPU cores"                                   
  msgstr[0] "$0 CPU 코어의 총 사용량"

when I run ./node_modules/po2json/bin/po2json -p tmp.po tmp then tmp looks like this:

{                                                                               
   "": {                                                                        
      "project-id-version": "PACKAGE VERSION",                                  
      "language": "ko",                                                         
      "mime-version": "1.0",                                                    
      "content-type": "text/plain; charset=UTF-8",                              
      "content-transfer-encoding": "8bit",                                      
      "plural-forms": "nplurals=1; plural=0",                                   
      "x-generator": "Weblate 3.10.1"                                           
   },                                                                           
   "Combined usage of $0 CPU core": [                                           
      "Combined usage of $0 CPU cores",                                         
      "$0 CPU 코어의 총 사용량"                                                 
   ]                                                                            
}  

Which is correct. But let's now add semicolon to the "Plural-Forms: nplurals=1; plural=0\n" line. Now the tmp.po file looks like this:

  msgid ""                                                                        
  msgstr ""                                                                       
  "Project-Id-Version: PACKAGE VERSION\n"                                         
  "Language: ko\n"                                                                
  "MIME-Version: 1.0\n"                                                           
  "Content-Type: text/plain; charset=UTF-8\n"                                     
  "Content-Transfer-Encoding: 8bit\n"                                             
  "Plural-Forms: nplurals=1; plural=0;\n"                                         
  "X-Generator: Weblate 3.10.1\n"                                                 

  msgid "Combined usage of $0 CPU core"                                           
  msgid_plural "Combined usage of $0 CPU cores"                                   
  msgstr[0] "$0 CPU 코어의 총 사용량" 

and when I run the same command, the tmp output is:

{                                                                               
   "": {                                                                        
      "project-id-version": "PACKAGE VERSION",                                  
      "language": "ko",                                                         
      "mime-version": "1.0",                                                    
      "content-type": "text/plain; charset=UTF-8",                              
      "content-transfer-encoding": "8bit",                                      
      "plural-forms": "nplurals=1; plural=0;",                                  
      "x-generator": "Weblate 3.10.1"                                           
   },                                                                           
   "Combined usage of $0 CPU core": [                                           
      "Combined usage of $0 CPU cores",                                         
      [                                                                         
         "$0 CPU 코어의 총 사용량"                                              
      ]                                                                         
   ]                                                                            
} 

So the translation for the string is not array of strings, but array of one string and one array.

Interestingly enough, if I have different file, like this:

  msgid ""                                                                        
  msgstr ""                                                                       
  "Project-Id-Version: PACKAGE VERSION\n"                                         
  "Language: cs\n"                                                                
  "MIME-Version: 1.0\n"                                                           
  "Content-Type: text/plain; charset=UTF-8\n"                                     
  "Content-Transfer-Encoding: 8bit\n"                                             
  "Plural-Forms: nplurals=3; plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2\n"        
  "X-Generator: Weblate 3.10.1\n"                                                 

  msgid "Combined usage of $0 CPU core"                                           
  msgid_plural "Combined usage of $0 CPU cores"                                   
  msgstr[0] "Kombinované využití $0 jádra procesoru"                              
  msgstr[1] "Kombinované využití $0 jader procesoru"                              
  msgstr[2] "Kombinované využití $0 jader procesoru"

The output is the same, no matter if there is semicolon or not on the Plural-Forms line. From docs it seems there always should be semicolon (1, 2). This is likely problem in some library that po2json uses, but was not sure where it really comes from, so reporting here. (side note: We had mix of some files having this semicolon and some don't for years and it seemed to work just fine. We were using Zanata to generate these files for us, now we migrated to Weblate and it adds this semicolon to some more languages (still not to all). So maybe this is known bug/documented somewhere)

hthetiot commented 2 years ago

Most likely https://github.com/smhg/gettext-parser related.