jooking / closure-library

Automatically exported from code.google.com/p/closure-library
Apache License 2.0
0 stars 1 forks source link

ClosureBuilder fails with UnicodeDecodeError #603

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
- What steps will reproduce the problem?

1. Install the environment (see "What version of the product are you using? On 
what operating system?" below).
2. Follow tutorial "Using ClosureBuilder" at 
https://developers.google.com/closure/library/docs/closurebuilder up to step 
"Calculating dependencies".
3. Execute 
"D:\Projekt\PhpStorm\ClosureTest>closure-library\closure\bin\build\closurebuilde
r.py --root=closure-library\" from the windows command prompt (change 
"D:\Projekt\PhpStorm\ClosureTest" according to your installation)

- What is the expected output? What do you see instead?

The result should be similar to what is described in the tutorial, i.e. 

closure-library/closure/bin/build/closurebuilder.py: Scanning paths...
closure-library/closure/bin/build/closurebuilder.py: 596 sources scanned.
closure-library/closure/bin/build/closurebuilder.py: Building dependency tree..
closure-library/closure/goog/base.js
closure-library/closure/goog/debug/error.js
closure-library/closure/goog/string/string.js
closure-library/closure/goog/asserts/asserts.js
closure-library/closure/goog/array/array.js
closure-library/closure/goog/dom/classes.js
closure-library/closure/goog/object/object.js
closure-library/closure/goog/dom/tagname.js
closure-library/closure/goog/useragent/useragent.js
closure-library/closure/goog/math/size.js
closure-library/closure/goog/math/coordinate.js
closure-library/closure/goog/dom/dom.js

but I get

D:\Projekt\PhpStorm\ClosureTest>closure-library\closure\bin\build\closurebuilder
.py --root=closure-library\
D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder
.py: Scanning paths...
Traceback (most recent call last):
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder.py", line 262, in <module>
    main()
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder.py", line 200, in main
    sources.add(_PathSource(js_path))
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder.py", line 175, in __init__
    super(_PathSource, self).__init__(source.GetFileContents(path))
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\source.py", line 112, in GetFileContents
    return fileobj.read()
  File "C:\Program Files\Python\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 6586: 
character maps to <undefined>

I understand that some character in some file could not be converted (most 
probably from Unicode to Windows1252). 
After changing method "GetFileContents" in 
\closure-library\closure\bin\build\source.py from

def GetFileContents(path):
  """Get a file's contents as a string.

  Args:
    path: str, Path to file.

  Returns:
    str, Contents of file.

  Raises:
    IOError: An error occurred opening or reading the file.

  """
  fileobj = open(path)
  try:
    return fileobj.read()
  finally:
    fileobj.close() 

to (see separate issue no. 602 and attached file)

def GetFileContents(path):
  """Get a file's contents as a string.

  Args:
    path: str, Path to file.

  Returns:
    str, Contents of file.

  Raises:
    IOError: An error occurred opening or reading the file.

  """
  fileobj = None

  try:
    fileobj = open(path)
    return fileobj.read()

  except Exception as e:
    raise IOError("{0} when opening or reading file {1}: {2}".format(e.__class__.__name__ , path, e))

  finally:
    if fileobj is not None:
        fileobj.close()

I get

D:\Projekt\PhpStorm\ClosureTest>closure-library\closure\bin\build\closurebuilder
.py --root=closure-library\
D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder
.py: Scanning paths...
Traceback (most recent call last):
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\source.py", line 121, in GetFileContents
    return fileobj.read()
  File "C:\Program Files\Python\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 6586: 
character maps to <undefined>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder.py", line 262, in <module>
    main()
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder.py", line 200, in main
    sources.add(_PathSource(js_path))
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder.py", line 175, in __init__
    super(_PathSource, self).__init__(source.GetFileContents(path))
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\source.py", line 124, in GetFileContents
    raise IOError("{0} when opening or reading file {1}: {2}".format(e.__class__.__name__ , path, e))
OSError: UnicodeDecodeError when opening or reading file 
closure-library\closure\goog\i18n\datetimepatterns.js: 'charmap' 
codec can't decode byte 0x8f in position 6586: character maps to <undefined>

Now I see, it happens when processing file 
closure-library\closure\goog\i18n\datetimepatterns.js (to be sure, I attached 
the original file I processed). There seems to be some "hidden" character that 
cannot be converted after the / in "MONTH_DAY_SHORT: 'd‏/M'" in

/**
 * Extended set of localized date/time patterns for locale ar.
 */
goog.i18n.DateTimePatterns_ar = {
  YEAR_FULL: 'yyyy',
  YEAR_MONTH_ABBR: 'MMM y',
  YEAR_MONTH_FULL: 'MMMM yyyy',
  MONTH_DAY_ABBR: 'd MMM',
  MONTH_DAY_FULL: 'dd MMMM',
  MONTH_DAY_SHORT: 'd‏/M',
  MONTH_DAY_MEDIUM: 'd MMMM',
  DAY_ABBR: 'd'
};

After deleting the character (just for test), if fails at position 20755, i.e. 
the first "일" in

/**
 * Extended set of localized date/time patterns for locale ko.
 */
goog.i18n.DateTimePatterns_ko = {
  YEAR_FULL: 'yyyy년',
  YEAR_MONTH_ABBR: 'y년 MMM',
  YEAR_MONTH_FULL: 'yyyy년 MMMM',
  MONTH_DAY_ABBR: 'MMM d일',
  MONTH_DAY_FULL: 'MMMM dd일',
  MONTH_DAY_SHORT: 'M. d',
  MONTH_DAY_MEDIUM: 'MMMM d일',
  DAY_ABBR: 'd일'
};

And so on. Obviously, changing datetimepatterns.js this way does not lead to 
anything. So I ask myself what is wrong?

 * Came these characters in accidentally? Did someone check in using a "wrong" encoding?
 * Or is it rather on my side using Windows? Why seems there to be a conversion from Unicode to Windows1252? The latter could be my system default (don't know), may that's why...
 * I suppose the characters where conversion fails are some asian special characters for which there is no equivalent in Windows1252. 
 * Is there a workaround (for my situation or in general)? Or is it a bug in Closure? 

- What version of the product are you using? On what operating system?

I am running Windows 7 on a 64bit MacBook Pro. I downloaded newest (as of 
11/10/2013) Closure Library (closure-library-20130212-95c19e7f0f5f.zip) and 
installed newest Python for Windows (python-3.3.2.amd64.msi).

- Please provide any additional information below.

- Note: we cannot accept patches without the contributor license agreement
being signed. See http://code.google.com/p/closure-
library/wiki/Contributors for more info.

Original issue reported on code.google.com by m...@jochenscharr.de on 10 Nov 2013 at 4:53

Attachments:

GoogleCodeExporter commented 9 years ago
Ha! Found it! See attached file: Change method GetFileContents in 
\closure-library\closure\bin\build\source.py to

def GetFileContents(path):
  """Get a file's contents as a string.

  Args:
    path: str, Path to file.

  Returns:
    str, Contents of file.

  Raises:
    IOError: An error occurred opening or reading the file.

  """
  fileobj = None

  try:
    fileobj = open(path, encoding='utf-8')
    return fileobj.read()

  except Exception as e:
    raise IOError("{0} when opening or reading file {1}: {2}".format(e.__class__.__name__ , path, e))

  finally:
    if fileobj is not None:
        fileobj.close()

The line

fileobj = open(path, encoding='utf-8')

is it. Files downloaded from 
https://code.google.com/p/closure-library/downloads/list seem to be 
UTF8-encoded (which I guessed). Now it runs through until 

D:\Projekt\PhpStorm\ClosureTest>closure-library\closure\bin\build\closurebuilder
.py --root=closure-library\ --root=myproject\ --namespace="myproject.start"
D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder
.py: Scanning paths...
D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder
.py: 934 sources scanned.
D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder
.py: Building dependency tree..
Traceback (most recent call last):
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder.py", line 262, in <module>
    main()
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\closurebuilder.py", line 211, in main
    tree = depstree.DepsTree(sources)
  File "D:\Projekt\PhpStorm\ClosureTest\closure-library\closure\bin\build\depstree.py", line 56, in __init__
    raise NamespaceNotFoundError(require, source)
depstree.NamespaceNotFoundError: Namespace "goog.i18n.bidi.Dir" never provided. 
Required in Source closure-library\closure\goog\soy\data.js

I will add a separate issue for that ...

Original comment by m...@jochenscharr.de on 10 Nov 2013 at 7:48

Attachments:

GoogleCodeExporter commented 9 years ago
Sorry, what I said in 

- What version of the product are you using? On what operating system?

was wrong. I am not working with downloaded version 
closure-library-20130212-95c19e7f0f5f.zip, but a clone of the current Closure 
Library version (http://code.google.com/p/closure-library/), created using git. 
The rest is correct.

Original comment by m...@jochenscharr.de on 10 Nov 2013 at 10:47

GoogleCodeExporter commented 9 years ago
As mentioned in issue 473, calcdeps.py on line 132 handles it correctly:

    # Python 3 requires the file encoding to be specified
    if (sys.version_info[0] < 3):
      file_handle = open(filename, 'r')
    else:
      file_handle = open(filename, 'r', encoding='utf8')

As proposed in issue 473, "all invocations of 'open' should similarly check the 
python version". IMHO, that should be centralized in a method wrapping 'open'. 
As an absolute newbie to Python and Closure, unfortunately, this is too much 
for me now ...

But I integrated it in source.py (see attached file):

def GetFileContents(path):
  """Get a file's contents as a string.

  Args:
    path: str, Path to file.

  Returns:
    str, Contents of file.

  Raises:
    IOError: An error occurred opening or reading the file.

  """
  fileobj = None

  try:
    # Python 3 requires the file encoding to be specified
    if(sys.version_info[0] < 3):
      fileobj = open(path, 'r')
    else:
      fileobj = open(path, 'r', encoding='utf8')
    return fileobj.read()

  except Exception as e:
    raise IOError("{0} when opening or reading file {1}: {2}".format(e.__class__.__name__ , path, e))

  finally:
    if fileobj is not None:
        fileobj.close()

Original comment by m...@jochenscharr.de on 10 Nov 2013 at 11:59

Attachments: