natural / java2python

Simple but effective library to translate Java source code to Python.
GNU Lesser General Public License v2.1
564 stars 243 forks source link

Fail with non-ascii char #37

Open ikus060 opened 8 years ago

ikus060 commented 8 years ago

Running j2py on a java file containing non ascii caracthers is failing:

Traceback (most recent call last):
  File "/usr/local/bin/j2py", line 120, in runTransform
    tree = buildAST(source)
  File "/usr/local/lib/python2.7/dist-packages/java2python/compiler/__init__.py", line 15, in buildAST
    lexer = Lexer(StringStream(source))
  File "/usr/local/lib/python2.7/dist-packages/antlr_python_runtime-3.1.3-py2.7.egg/antlr3/streams.py", line 336, in __init__
    self.strdata = unicode(data)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3345: ordinal not in range(128)

I guess j2py is not properly handling unicode string.

ikus060 commented 8 years ago

I didn'T review all the code to make sure my modification didn't intorudce any side effect, but this following modification is working for me.

diff --git a/bin/j2py b/bin/j2py
index 6eb1a40..34f1548 100755
--- a/bin/j2py
+++ b/bin/j2py
@@ -6,12 +6,16 @@
 a file, translate it, and write it out.

 """
+from __future__ import unicode_literals
+
 import sys
 from argparse import ArgumentParser, ArgumentTypeError
 from collections import defaultdict
+from io import open
 from logging import _levelNames as logLevels, exception, warning, info, basicConfig
 from os import path, makedirs
 from time import time

 from java2python.compiler import Module, buildAST, transformAST
 from java2python.config import Config
@@ -107,7 +111,7 @@

     try:
         if filein != '-':
-            source = open(filein).read()
+            source = open(filein, encoding='utf-8').read()
         else:
             source = sys.stdin.read()
     except (IOError, ), exc:
diff --git a/java2python/compiler/__init__.py b/java2python/compiler/__init__.py
index 4325201..b16c51f 100644
--- a/java2python/compiler/__init__.py
+++ b/java2python/compiler/__init__.py
@@ -5,6 +5,7 @@
 # This module provides a simpler facade over the rest of the compiler
 # subpackage.  Client code should use the values in this module
 # instead of using directly referencing items within the subpackage.
+from __future__ import unicode_literals

 from java2python.compiler.block import Module
 from java2python.lang import Lexer, Parser, StringStream, TokenStream, TreeAdaptor
diff --git a/java2python/compiler/block.py b/java2python/compiler/block.py
index 4cf7b09..185df39 100644
--- a/java2python/compiler/block.py
+++ b/java2python/compiler/block.py
@@ -11,6 +11,7 @@
 # This means they're very tightly coupled and that the classes are not
 # very reusable.  The module split does allow for grouping of related
 # methods and does hide the cluttered code.
+from __future__ import unicode_literals

 from sys import modules
 from java2python.compiler import template, visitor
@@ -19,7 +20,7 @@
 def addTypeToModule((className, factoryName)):
     """ Constructs and adds a new type to this module. """
     bases = (getattr(template, className), getattr(visitor, className))
-    newType = type(className, bases, dict(factoryName=factoryName))
+    newType = type(str(className), bases, dict(factoryName=factoryName))
     setattr(modules[__name__], className, newType)

diff --git a/java2python/compiler/template.py b/java2python/compiler/template.py
index 4f4dfe1..d7b80bb 100644
--- a/java2python/compiler/template.py
+++ b/java2python/compiler/template.py
@@ -12,8 +12,9 @@
 # the compiler subpackage into multiple modules.  So-called patterns
 # are usually a sign of a bad design and/or language limitations, and
 # this case is no exception.
+from __future__ import unicode_literals

-from cStringIO import StringIO
+from io import StringIO
 from functools import partial
 from itertools import chain, ifilter, imap

diff --git a/java2python/compiler/visitor.py b/java2python/compiler/visitor.py
index f62e53e..9d3bcaf 100644
--- a/java2python/compiler/visitor.py
+++ b/java2python/compiler/visitor.py
@@ -10,7 +10,7 @@
 # at runtime.  These classes use their factory callable more often than their
 # template counterparts; during walking, the typical behavior is to either define
 # the specific Python source, or to defer it to another block, or both.
-
+from __future__ import unicode_literals

 from functools import reduce, partial
 from itertools import ifilter, ifilterfalse, izip, tee
diff --git a/java2python/config/__init__.py b/java2python/config/__init__.py
index 2aa8387..74ce4a3 100644
--- a/java2python/config/__init__.py
+++ b/java2python/config/__init__.py
@@ -1,6 +1,7 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 # java2python.config -> subpackage for run-time configuration.
+from __future__ import unicode_literals

 from functools import reduce
 from imp import load_source
diff --git a/java2python/config/default.py b/java2python/config/default.py
index 92c4a27..da51fe5 100644
--- a/java2python/config/default.py
+++ b/java2python/config/default.py
@@ -4,6 +4,7 @@
 # This is the default configuration file for java2python.  Unless
 # explicity disabled with the '-n' or '--nodefaults' option, the j2py
 # script will import this module for runtime configuration.
+from __future__ import unicode_literals

 from java2python.mod import basic, transform
 from java2python.lang.selector import *
diff --git a/java2python/lang/JavaLexer.py b/java2python/lang/JavaLexer.py
index 9c1725a..3f3f5fe 100644
--- a/java2python/lang/JavaLexer.py
+++ b/java2python/lang/JavaLexer.py
@@ -1,4 +1,5 @@
 # $ANTLR 3.1.3 Mar 18, 2009 10:09:25 Java.g 2012-01-29 13:54:05
+from __future__ import unicode_literals

 import sys
 from antlr3 import *
diff --git a/java2python/lang/JavaParser.py b/java2python/lang/JavaParser.py
index 28b9c64..cd3ff20 100644
--- a/java2python/lang/JavaParser.py
+++ b/java2python/lang/JavaParser.py
@@ -1,4 +1,5 @@
 # $ANTLR 3.1.3 Mar 18, 2009 10:09:25 Java.g 2012-01-29 13:54:04
+from __future__ import unicode_literals

 import sys
 from antlr3 import *
diff --git a/java2python/lang/base.py b/java2python/lang/base.py
index 0633b8e..f0202a1 100644
--- a/java2python/lang/base.py
+++ b/java2python/lang/base.py
@@ -46,6 +46,7 @@
 # Tree objects.  Our adaptor, TreeAdaptor, creates the LocalTree
 # instances.
 #
+from __future__ import unicode_literals

 from cStringIO import StringIO

diff --git a/java2python/lang/selector.py b/java2python/lang/selector.py
index 22b531a..ba9ca54 100644
--- a/java2python/lang/selector.py
+++ b/java2python/lang/selector.py
@@ -14,6 +14,7 @@
 # Projects using java2python should regard this subpackage as
 # experimental.  While the interfaces are not expected to change, the
 # semantics may.  Use with caution.
+from __future__ import unicode_literals

 from java2python.lang import tokens

diff --git a/java2python/lib/__init__.py b/java2python/lib/__init__.py
index efd7b1a..365578f 100644
--- a/java2python/lib/__init__.py
+++ b/java2python/lib/__init__.py
@@ -1,6 +1,7 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 # java2python.lib -> common library bits.
+from __future__ import unicode_literals

 from functools import partial

diff --git a/java2python/mod/basic.py b/java2python/mod/basic.py
index 02e2f57..3a125ae 100644
--- a/java2python/mod/basic.py
+++ b/java2python/mod/basic.py
@@ -1,6 +1,7 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 # java2python.mod.basic -> functions to revise generated source strings.
+from __future__ import unicode_literals

 from itertools import count
 from logging import info, warn
diff --git a/java2python/mod/transform.py b/java2python/mod/transform.py
index 9b2e567..5a8ade6 100644
--- a/java2python/mod/transform.py
+++ b/java2python/mod/transform.py
@@ -10,6 +10,7 @@
 #
 # See the java2python.config.default and java2python.lang.selector modules to
 # understand how and when selectors are associated with these callables.
+from __future__ import unicode_literals

 import re
 from logging import warn
luipugs commented 8 years ago

@ikus060 I tried patching with the diff you posted but encountered the following error:

$ patch -u -p1 < fix.diff 
patching file bin/j2py
patch: **** malformed patch at line 21: @@ -107,7 +111,7 @@

How did you apply your diff?

mahi83 commented 7 years ago

I am facing the same error. Could you guys tell me what you did to fix it?

alisonreboud commented 6 years ago

Still have problems with that as well :)

mazz commented 6 years ago

use this git patch.

git apply unicode.patch

https://gist.github.com/mazz/8924b1d93cb3d16790e39da001823435

pascalJakobs commented 6 years ago

hello all, I can't apply your patch mazz, here follow the errors I got

pi@raspberrypi:~/java2python-0.5.1 $ git apply unicode.patch error: patch failed: java2python/mod/transform.py:10 error: java2python/mod/transform.py: patch does not apply

Anybody can help please, I'm just a worm

Tks

mazz commented 6 years ago

@pascalJakobs

virtualenv j2p
cd j2p
source bin/activate
pip install http://antlr3.org/download/Python/antlr_python_runtime-3.1.3.tar.gz
git clone https://github.com/natural/java2python.git
pip install -e java2python
cd java2python
<copy the raw patch to your clipboard>
cat >> unicode.patch
<paste>
<ctrl-d>
git apply unicode.patch
pascalJakobs commented 6 years ago

Hello, many thanks for your kind reply. I'm sorry but I can make it (I'm runnig linux mint) and as I need it only once I don't feel brave enough to fight for ages. Any chance you could convert the attach java code please? Thanks for your help anyway.

kind regards

Le lun. 20 août 2018 à 02:50, Michael notifications@github.com a écrit :

@pascalJakobs https://github.com/pascalJakobs

virtualenv j2p cd j2p source bin/activate pip install http://antlr3.org/download/Python/antlr_python_runtime-3.1.3.tar.gz git clone https://github.com/natural/java2python.git pip install -e java2python cd java2python

cat >> unicode.patch git apply unicode.patch — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .

-- pascal