rahulpathakgit / codeswarm

Automatically exported from code.google.com/p/codeswarm
GNU General Public License v3.0
0 stars 0 forks source link

Path not escaped (convert_logs) #11

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The paths in the XML file are not escaped, so any " in the path would screw up 
the output. & should 
be escaped too.

Original issue reported on code.google.com by whitefa...@gmail.com on 24 Jun 2008 at 5:35

GoogleCodeExporter commented 8 years ago
The XML spec says:

'"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

So it appears the ", < and & needs to be escaped.

Original comment by whitefa...@gmail.com on 24 Jun 2008 at 7:55

GoogleCodeExporter commented 8 years ago

Original comment by cgalvan1...@gmail.com on 24 Jun 2008 at 1:01

GoogleCodeExporter commented 8 years ago
for escaping i have added :
import xml.sax.saxutils

and in the create_event_xml:
called xml.sax.saxutils.escape(event.filename) and for event.author to

Original comment by JcAs...@gmail.com on 6 Jul 2008 at 11:29

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I tried several approaches to fixing the issue since I got hit by it (the svn 
log I'm
trying to convert contains "&", ">" and "<" characters in file names):

* Using xml.dom.minidom to generate the output XML instead of manual borking
* Using ElementTree to generate the whole document
* Using ElementTree to generate each <event/> element (but leave the outermost 
part
of the document written manually)
* implement suggestion of #3

the results were the following, given a baseline performance of 9.5s to 
generate the
file with the current incorrect script (the input verbose svn log is 29Mb, the 
output
file contains nearly 270k events):

* Minidom generation takes about 3mn, which is clearly unacceptable. On the 
other
hand, it can be cleanly formatted (whitespace and indentation, ...)
* Generating the whole document with ElementTree (or cElementTree) took 40s (or 
30s),
which is 4 (or 3) times that of the current incorrect solution but looks much
cleaner. I also don't believe this is too problematic a time as conversion 
shouldn't
be done often. On the other hand, it requires bundling ElementTree.
* Generating the document lines with ElementTree (or cElementTree) actually had
performances worse than the case above by 10%, at 44s (or 33), but it requires 
a much
lower amount of memory.
* Finally, simply adding escaping via saxutils was a bit above 11s.

I'm attaching the patch (which should apply cleanly at p1) for the later 
version, but
I can also provide a patch for elementtree-generation if desired.

Versions:
    Python 2.5.1 (r251:54863, Apr 15 2008, 22:57:26) 
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin

ElementTree and cElementTree are those built in Python 2.5, so is saxutils

Original comment by maskl...@gmail.com on 14 Jul 2008 at 10:46

Attachments:

GoogleCodeExporter commented 8 years ago
Just wanted to confirm that this patch appears to work. I was getting EOF 
exceptions
at run time from Java and after applying this to convert_logs.py it worked.

Original comment by Jonathan...@gmail.com on 14 Jul 2008 at 5:09

GoogleCodeExporter commented 8 years ago
I have applied the patch to the trunk, thanks :)

Original comment by cgalvan1...@gmail.com on 15 Jul 2008 at 12:40