Closed GoogleCodeExporter closed 8 years ago
The XML spec says:
'"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
So it appears the ", < and & needs to be escaped.
Original comment by whitefa...@gmail.com
on 24 Jun 2008 at 7:55
Original comment by cgalvan1...@gmail.com
on 24 Jun 2008 at 1:01
for escaping i have added :
import xml.sax.saxutils
and in the create_event_xml:
called xml.sax.saxutils.escape(event.filename) and for event.author to
Original comment by JcAs...@gmail.com
on 6 Jul 2008 at 11:29
[deleted comment]
I tried several approaches to fixing the issue since I got hit by it (the svn
log I'm
trying to convert contains "&", ">" and "<" characters in file names):
* Using xml.dom.minidom to generate the output XML instead of manual borking
* Using ElementTree to generate the whole document
* Using ElementTree to generate each <event/> element (but leave the outermost
part
of the document written manually)
* implement suggestion of #3
the results were the following, given a baseline performance of 9.5s to
generate the
file with the current incorrect script (the input verbose svn log is 29Mb, the
output
file contains nearly 270k events):
* Minidom generation takes about 3mn, which is clearly unacceptable. On the
other
hand, it can be cleanly formatted (whitespace and indentation, ...)
* Generating the whole document with ElementTree (or cElementTree) took 40s (or
30s),
which is 4 (or 3) times that of the current incorrect solution but looks much
cleaner. I also don't believe this is too problematic a time as conversion
shouldn't
be done often. On the other hand, it requires bundling ElementTree.
* Generating the document lines with ElementTree (or cElementTree) actually had
performances worse than the case above by 10%, at 44s (or 33), but it requires
a much
lower amount of memory.
* Finally, simply adding escaping via saxutils was a bit above 11s.
I'm attaching the patch (which should apply cleanly at p1) for the later
version, but
I can also provide a patch for elementtree-generation if desired.
Versions:
Python 2.5.1 (r251:54863, Apr 15 2008, 22:57:26)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
ElementTree and cElementTree are those built in Python 2.5, so is saxutils
Original comment by maskl...@gmail.com
on 14 Jul 2008 at 10:46
Attachments:
Just wanted to confirm that this patch appears to work. I was getting EOF
exceptions
at run time from Java and after applying this to convert_logs.py it worked.
Original comment by Jonathan...@gmail.com
on 14 Jul 2008 at 5:09
I have applied the patch to the trunk, thanks :)
Original comment by cgalvan1...@gmail.com
on 15 Jul 2008 at 12:40
Original issue reported on code.google.com by
whitefa...@gmail.com
on 24 Jun 2008 at 5:35