roelderickx / ogr2osm

A tool for converting ogr-readable files like shapefiles into .pbf or .osm data
https://pypi.org/project/ogr2osm/
MIT License
59 stars 14 forks source link

Issues with encoding in windows batch #8

Closed Meibes closed 3 years ago

Meibes commented 3 years ago

Hi,

I was trying to use ogr2osm in a windows batch but had a lot of encoding problems, because the batch always created ANSI-encoded files, but my workflow needs utf-8 encoded files. I managed to solve my issue by changing the following line: self.f = open(self.filename, 'w', buffering = -1) to self.f = open(self.filename, 'w', buffering = -1, encoding="utf-8")

there is already a parameter called "encoding" but it seems it is only used for the source file, could we extend this "encoding" to be used in the destination file as well? or could we introduce another parameter for that? what are your thoughts? or do you have a tip how I can force the windows batch to output utf-8 without changing ogr2osm?

thanks for this awesome tool =)

roelderickx commented 3 years ago

Thanks for your bug report. This issue looks like a duplicate of pnorman/ogr2osm#15 but your solution is different and you have found a testcase where the current method has issues.

Some observations:

Given the last observation ogr2osm is supposed to output UTF-8 at the moment, eventually translating from the input file encoding if necessary. To obtain consistent behaviour across different operating systems it is as such necessary to pass encoding='utf-8' as you suggested. I would also explicitly specify the encoding in the header then, ie <?xml version="1.0" encoding="utf-8"?>.

I can confirm the testcases still pass on Linux with your suggested modification. Can you verify if the testcases pass under Windows as well?

Meibes commented 3 years ago

Thanks for the fast answer!

`<?xml version="1.0" encoding="UTF-8"?>

The data included in this document is from www.openstreetmap.org. The data is made available under ODbL. ` After making these changes everything runs smooth in the batch.
roelderickx commented 3 years ago

Ok. I am not sure if the cram tests can be run as is under Windows, but can you try to convert at least test/shapefiles/japanese.shp and confirm if the formatted result matches test/japanese.xml?

In the test script the output is formatted using xmllint before comparing:

ogr2osm --encoding shift_jis --gis-order -f test/shapefiles/japanese.shp
xmllint --format japanese.osm > japanese.xml
roelderickx commented 3 years ago

Meanwhile I managed to test the modification in Windows, the test is conclusive. The proposed changes have been merged into master. Thanks @Meibes for your investigation.