Closed GoogleCodeExporter closed 8 years ago
I very much doubt this is actually broken, I have used rdflib to serialise and
parse huge amounts of RDF over the years, most of it with non-ascii characters.
Try the attached file - it parses fine with rdflib 3.1 here.
Could you send me a file that breaks? Or a small segment of one?
Thanks for you report!
Original comment by gromgull
on 22 Mar 2011 at 9:06
Attachments:
[deleted comment]
Hi gromgull,
you are right, the parse is working properly. The bug is on the terms string
representation (try testTerm.py)
I'm adapting the "store" package (from 2.4.0) to work on rdflib 3.1.0, and this issue was related with the QuadSlots. The NormalizeValue function tried to get the string representation of each term, and it raised an UnicodeEncodeError due to the presence of non-ascii characters on the Literals.
About the situation of my "challenge", I already have compatibility with MySQL
and SQLite, and I'm developing a MongoDB adapter.
You guys did a great job with rdflib!
Original comment by fertap...@gmail.com
on 22 Mar 2011 at 9:49
Attachments:
Nice to see someone work on restoring the stores!
I would encourage you to share you results with the rdfextras project! Cloning
it to your own HG repository is very easy!
Many people seem to work on parts of this, look for instance at:
My own work on getting mysql back in:
http://code.google.com/p/rdfextras/source/browse/?r=mysql-restore
Gjhiggins work on bitbucket:
https://bitbucket.org/gjhiggins/rdfextras-dev
(he also worked here: https://bitbucket.org/ww/rdfextras I am still not sure
why he started again :)
(see also his post here:
http://groups.google.com/group/rdflib-dev/browse_thread/thread/299d19df6071831?u
token=0Nq44CwAAABeMHPLQBVP_FB2uM6rRJ0mcqaTpo-WbDTa2vMSLDpNm8fnMnKR50GP5bAaVv_m3N
k)
I think also layercake contains some rdflib 3.0 fixes and the old stores:
http://code.google.com/p/python-dlp/source/browse/trunk/#trunk/layercake-python
Original comment by gromgull
on 23 Mar 2011 at 10:30
The bug with the literals is not a bug - you are trying to cast a unicode
string to a string, but it contains non-ascii characters that cannot be stored
in a string. There is no way to do this correctly.
You either have to explicitly encode it using some encoding, f.x.
test.encode("utf-8"), or use repr and have us pick one for you:
>>> repr(test)
rdflib.term.Literal(u'\\xe9sta')
Original comment by gromgull
on 23 Mar 2011 at 10:32
Original issue reported on code.google.com by
fertap...@gmail.com
on 22 Mar 2011 at 5:14