xiehuachun / protobuf-java-format

Automatically exported from code.google.com/p/protobuf-java-format
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Wrong escaping bytes '<', '&' etc. #10

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
To reporoduce convert to XML data containing special symbols e.g. '<' or '&'

Expected:  @amp; @lt; etc.
Got: &, <

To fix: add precessing for special symbols in  method
  XMLFormat.escapeBytes(ByteString input)

Original issue reported on code.google.com by Vitaly.R...@gmail.com on 18 Jan 2010 at 8:10

GoogleCodeExporter commented 9 years ago
Vitaly, is there an xml sample with the corresponding proto definition that you 
can provide to reproduce this 
case?

Original comment by aant...@gmail.com on 23 Feb 2010 at 5:20

GoogleCodeExporter commented 9 years ago
I will provide this ASAP. You can simply reproduce this if you have String 
value in 
protobuf object containing '&' or '<' for example. In result XML you'll get the
symbol unchanged but it should be replaced with & or < to make the XMl valid

Original comment by Vitaly.R...@gmail.com on 23 Feb 2010 at 6:52

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
My apologize I attached wrong file. I will attach right one tomorrow.  there 
are 3
fixes there:
  - regex for TOKEN (completely changed)   
  - escaping XML entities
  - un-escaping  XML enitites

Original comment by Vitaly.R...@gmail.com on 10 Mar 2010 at 9:04

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Here is the code.

Original comment by Vitaly.R...@gmail.com on 11 Mar 2010 at 2:52

Attachments:

GoogleCodeExporter commented 9 years ago
This last file works great and solves several issues.  Thanks.  
I believe that there is still a problem with unicode characters which should 
escape to 
&#{codepoint}

Original comment by amoffet@gmail.com on 11 Mar 2010 at 6:40

GoogleCodeExporter commented 9 years ago
what is the problem with unicode characters you mean? I tested last attached 
file
with  russian characters  and everything  seems worked fine

Original comment by Vitaly.R...@gmail.com on 11 Mar 2010 at 9:33

GoogleCodeExporter commented 9 years ago
if the source protocol buffer has one or more unicode characters such as 
\u20013 - as I 
understand it - it should be escaped to 中.  For regular unicode such as you 
are 
describing, things work well.  And, I should mention, that it is escaped into 
octal 
sequences and unescaped from there.  However, the XML standards suggest that is 
an 
unusual tact - and it should instead look as I mentioned.  Thanks for your work 
on 
this.

Original comment by amoffet@gmail.com on 11 Mar 2010 at 11:11

GoogleCodeExporter commented 9 years ago
I haven't read the XML spec so i cannot comment on the last point. But what I 
can say is that XmlFormat.java in v.1.1.1 (r43) fails to merge special chars 
like German umlauts (ä,ö,ü,ß) and even fails on simple things like dot (.), 
single (') or double quotes (") within a message's string property. This is 
fatal!

Applying Vitaly's patch made my tests work. No problems so far. As I said, I 
don't know if it is perfect now, but at least it doesn't fail on such basic 
things.

Here is a patch file based on r43 (v.1.1.1) which includes Vitalys changes. 
Maybe this could find its way into the next version.

Original comment by stephan....@gmail.com on 4 Apr 2011 at 2:31

Attachments:

GoogleCodeExporter commented 9 years ago
In case anyone is interested, I needed an in-memory DOM of the XML for my 
project, so I rewrote XmlFormat using Dom4j, which correctly handles character 
escaping and other XML standards. Source & binaries can be found here: 
http://code.google.com/p/protobuf-xml-format-for-java/

Cheers,

Yegor

Original comment by Yegor.Jb...@gmail.com on 4 Apr 2011 at 5:06

GoogleCodeExporter commented 9 years ago
Sounds good even though I would prefer a single stable project for various 
formats.

Do you have any benchmarks of your XmlFormat compared to the original?

Original comment by stephan....@gmail.com on 4 Apr 2011 at 7:02

GoogleCodeExporter commented 9 years ago
I am pretty sure Dom4j adds some overhead in both CPU and memory, however I 
haven't done any benchmarking. One thing to keep in mind is that the dom4j 
version will first create a complete DOM structure in memory and then generate 
a full XML string. There is no streaming API, like in the original version.

The reason I decided to keep it separate from this project instead of proposing 
a patch is because Dom4j would be quite a big dependency and everyone has their 
own favorite XML toolkit.

Of course, I wouldn't mind if it were included in this project, as long as the 
maintainer is ok with it.

Original comment by Yegor.Jb...@gmail.com on 4 Apr 2011 at 8:04

GoogleCodeExporter commented 9 years ago
Alright, thank you for your answer. We'll see how things work out in the next 
version.

Original comment by stephan....@gmail.com on 4 Apr 2011 at 8:10

GoogleCodeExporter commented 9 years ago
Yegor / Stephan, any of you like to join as commiters for XmlFormatter?

Original comment by eliran.bivas on 3 May 2011 at 1:34

GoogleCodeExporter commented 9 years ago

Original comment by eliran.bivas on 3 May 2011 at 1:36

GoogleCodeExporter commented 9 years ago

Original comment by eliran.bivas on 3 May 2011 at 1:37

GoogleCodeExporter commented 9 years ago
Hi, Eliran,

I'd be happy to help. How do I sign up?

Yegor

Original comment by Yegor.Jb...@gmail.com on 4 May 2011 at 4:00

GoogleCodeExporter commented 9 years ago
Here's an extract from the mail I just wrote to Eliran. Maybe someone else will 
find the attached files helpful.

---

hi eliran,

thank you for asking. if you want me to join, i certainly will. however, i 
cannot promise that i'll find enough time to contribute regularly, if ever... 
but i'll try my best.

... [snipped]

apart from the problems i mentioned on your board, i found some more stuff, 
that didn't work as expected. i fixed the issues one by one until the json and 
xml format fit our needs and satisfied our test cases with special chars, 
extensions, nested types etc.. apart from just fixing bugs i also changed the 
code (structure) itself - sometimes because there was no other way of achieving 
what i wanted (no extension possible due to static classes), sometimes because 
of inconsistencies and redundancies (so i introduced an abstract base class). 
the real problem is, that i didn't write down all the things i changed simply 
because it was a fluent process and the outcome not clear when i started. that 
was really stupid because now there is hardly no chance to merge the stuff back 
to your project. however, my plan is to release the ground work of the 
messaging framework i wrote as an open source project as soon as i find the 
time to. and now the million dollar question i wanted to ask you:

may i include the json and xml format classes (see attachment - extracted from 
my framework) which are heavily based on your stuff in this open source 
messaging framework? if not, i'm afraid an open source release would not make 
much sense. due to the tight coupling of the two classes with the rest of my 
framework an external dependency to your project wouldn't make much sense 
either. so i really hope you allow me to include them.

of course, if you (or whoever) find them helpful, you can do with them whatever 
you want. they're provided as is. maybe they'll even find their way into 
protobuf-java-format.

... [snipped]

---

stephan

Original comment by stephan....@gmail.com on 5 May 2011 at 5:00

Attachments: