pnorman / ogr2osm

pnorman's version of UVM's Rewrite of ogr2osm
Other
78 stars 46 forks source link

Pretty printing xml and utf-8 xml export #16

Closed yvecai closed 11 years ago

yvecai commented 11 years ago

Pretty printing the xml output should be mandatory, as user should be able to check the generated xml before import :)

utf-8 encoding the xml generated by lxml allow proper export of accentuated characters.

simon04 commented 11 years ago

For pretty printing it might be sensible to add a commandline option since it is mainly for testing/debugging and has an impact on file size and execution time.

For a consistent code style, note the spacing after commas …

pnorman commented 11 years ago

What benchmarking have you done to ensure that it doe not slow down the XML output? Serializing the XML is by far the slowest part of the program on large files.

Please do a pull request with only the relevant commit, not other ones.

Does your commit pass the testcases?

yvecai commented 11 years ago

I'll check for testcases and add pretty-printing as an option, but don't have time right now.

pnorman commented 11 years ago

If pretty print should be enabled or not is still an open question.

pnorman commented 11 years ago

Results from testing with a NHD file

Normal:                            11.71s in tostring, 52.21s in output, 245M .osm
pretty_print:                      12.86s in tostring, 54.93s in output, 256M .osm
pretty_print, current \n removed:  12.94s in tostring, 94.51s in output, 254M .osm

This is a 5% increase in file size and a 10% increase in tostring time.

For reference, xmllint --format takes 12.73s to pretty a file and increases the size to 264M

Given that there is a noticeable increase in file size and the time in tostringcan be substantial on larger datasets, I won't be adding this in, given that xmllint --format is an option for anyone who really cares about the XML formatting.

Firefishy commented 11 years ago

:-(