walidazizi / rdflib

Automatically exported from code.google.com/p/rdflib
Other
0 stars 0 forks source link

UnicodeEncodeError #163

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
RDFLib version 3.1.0

What steps will reproduce the problem?
1. Choose some RDF file ("example.rdf") with non-ascii characters such as "á, 
é"
2. Parse the file with a graph instance: graph.parse("example.rdf")
3. You should get an UnicodeEncodeError

I debug it by myself...in term.py all the self.encode() should be 
self.encode("unicode-escape")
Regards

Original issue reported on code.google.com by fertap...@gmail.com on 22 Mar 2011 at 5:14

GoogleCodeExporter commented 8 years ago
I very much doubt this is actually broken, I have used rdflib to serialise and 
parse huge amounts of RDF over the years, most of it with non-ascii characters. 

Try the attached file - it parses fine with rdflib 3.1 here. 

Could you send me a file that breaks? Or a small segment of one?

Thanks for you report!

Original comment by gromgull on 22 Mar 2011 at 9:06

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Hi gromgull,
you are right, the parse is working properly. The bug is on the terms string 
representation (try testTerm.py)

 I'm adapting the "store" package (from 2.4.0) to work on rdflib 3.1.0, and this issue was related with the QuadSlots. The NormalizeValue function tried to get the string representation of each term, and it raised an UnicodeEncodeError due to the presence of non-ascii characters on the Literals.

About the situation of my "challenge", I already have compatibility with MySQL 
and SQLite, and I'm developing a MongoDB adapter.

You guys did a great job with rdflib!

Original comment by fertap...@gmail.com on 22 Mar 2011 at 9:49

Attachments:

GoogleCodeExporter commented 8 years ago
Nice to see someone work on restoring the stores!

I would encourage you to share you results with the rdfextras project! Cloning 
it to your own HG repository is very easy!

Many people seem to work on parts of this, look for instance at:

My own work on getting mysql back in: 
http://code.google.com/p/rdfextras/source/browse/?r=mysql-restore

Gjhiggins work on bitbucket: 

https://bitbucket.org/gjhiggins/rdfextras-dev
(he also worked here: https://bitbucket.org/ww/rdfextras I am still not sure 
why he started again :)
(see also his post here: 
http://groups.google.com/group/rdflib-dev/browse_thread/thread/299d19df6071831?u
token=0Nq44CwAAABeMHPLQBVP_FB2uM6rRJ0mcqaTpo-WbDTa2vMSLDpNm8fnMnKR50GP5bAaVv_m3N
k)

I think also layercake contains some rdflib 3.0 fixes and the old stores: 

http://code.google.com/p/python-dlp/source/browse/trunk/#trunk/layercake-python

Original comment by gromgull on 23 Mar 2011 at 10:30

GoogleCodeExporter commented 8 years ago
The bug with the literals is not a bug - you are trying to cast a unicode 
string to a string, but it contains non-ascii characters that cannot be stored 
in a string. There is no way to do this correctly. 

You either have to explicitly encode it using some encoding, f.x. 
test.encode("utf-8"), or use repr and have us pick one for you: 

>>> repr(test)
rdflib.term.Literal(u'\\xe9sta')

Original comment by gromgull on 23 Mar 2011 at 10:32