jimregan / mlode

Automatically exported from code.google.com/p/mlode
0 stars 0 forks source link

Strange encoding %47 #64

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

wget http://monnetproject.deri.ie/lemonsource/Special:Dump/wordnet.zip
unzip wordnet.zip
cd lexicon-tmp/
less wordnet-wordsensesandwords.rdf | grep '%47' > test.log

What is the expected output? What do you see instead?

Sample output:
 lwn:more_than-adverb ,<http://monnetproject.deri.ie/lemonsource/wordnet/E%47_O%47_Lawrence-noun> ,lwn:boracic_acid-nou
n ,

Description:
%47 is decoded to be 'G' but this why?

How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)

Please use labels and text to provide additional information.

Original issue reported on code.google.com by mohamedd...@gmail.com on 13 Aug 2012 at 10:22

Attachments:

GoogleCodeExporter commented 9 years ago
It's supposed to be '.' - 2e in hex, 47 in decimal.

Original comment by joregan on 13 Aug 2012 at 4:04

GoogleCodeExporter commented 9 years ago
Try decoding the URI 
http://monnetproject.deri.ie/lemonsource/wordnet/E%47_O%47_Lawrence-noun
in some decoder like http://meyerweb.com/eric/tools/dencoder/
it will generate
http://monnetproject.deri.ie/lemonsource/wordnet/EG_OG_Lawrence-noun
with %47 decoded as 'D' not '.'

Original comment by mohamedd...@gmail.com on 14 Aug 2012 at 9:34

GoogleCodeExporter commented 9 years ago
OK, so it turns out for some reason there was a command in the source code to 
convert "." to "%47" in all URLs... why I don't know but not going to mess with 
it so I just changed it to "%2E" and will reload in the repo

Original comment by johnmcc...@gmail.com on 14 Aug 2012 at 12:21

GoogleCodeExporter commented 9 years ago
Hm, normally '.' doesn't need to be encoded, right?
So this is really strange, what is the status of this task?

Original comment by kur...@googlemail.com on 25 Aug 2012 at 7:49

GoogleCodeExporter commented 9 years ago
'.' does need to be escaped in N3/Turtle*. I changed the encoding but it seems 
browsers will automatically convert the character in a way that confuses the 
SPARQL query system... generating again with (hopefully) correct Turtle.

* e.g.,
lemonwordnet:cat-noun rdfs:seeAlso lemonwordnet:cat-noun.ttl .

Original comment by johnmcc...@gmail.com on 27 Aug 2012 at 11:56

GoogleCodeExporter commented 9 years ago
Can NOT validate because the data-set link does not work at 
http://monnetproject.deri.ie/lemonsource/Special:Dump/wordnet.zip

Original comment by mohamedd...@gmail.com on 27 Aug 2012 at 3:58

GoogleCodeExporter commented 9 years ago
OK, fixed but server stability are kind of critical, the site seems to be down 
more than it is up :(

Original comment by johnmcc...@gmail.com on 28 Aug 2012 at 2:40

GoogleCodeExporter commented 9 years ago
Should be fixed

Original comment by johnmcc...@gmail.com on 31 Aug 2012 at 12:29

GoogleCodeExporter commented 9 years ago

Original comment by mohamedd...@gmail.com on 31 Aug 2012 at 1:44