merge_tags should be optional, even for duplicate geometry

roelderickx / ogr2osm

A tool for converting ogr-readable files like shapefiles into .pbf or .osm data

https://pypi.org/project/ogr2osm/

MIT License

59 stars 14 forks source link

merge_tags should be optional, even for duplicate geometry #11

Closed prusswan closed 3 years ago

prusswan commented 3 years ago

Duplicate geometry should be reported, but the automerging of tags has led to unexpected results. I would prefer to have control over the merge resolution of the original inputs, rather than dealing with automerged result.

roelderickx commented 3 years ago

Merging duplicate geometries is required to create valid osm files, but there are plenty of use-cases where merging is undesirable. In these situations a custom translation file can be written to disable merging, for example:

import ogr2osm

class NoMergeTranslation(ogr2osm.TranslationBase):
    def merge_tags(self, geometry_type, tags_existing_geometry, tags_new_geometry):
        return None

You probably want more control than just disable all merging, but of course more complicated logic is also possible. I tried my best to think of all possible scenarios while developing merging, please let me know if the current solution doesn't entirely fit your needs. More details about your specific use-case would be appreciated in this case.

prusswan commented 3 years ago

My use case requires importing osm data (converted from another format) into the osmosis pgsnapshot schema for further corrections and processing. Under this schema, duplicate geometry is allowed (but fixing is okay provided I know which entry to retain), while the non-geometry attributes would become mangled as a result of automerge (it just makes a bigger mess since it ends up mixing the reference keys needed to trace the original inputs, e.g. k="ref_id", v="23243,23443255", would rather it just reports the ref_ids in question). I do consider ogr2osm as a converter, so data correction may be beyond what it is meant to do. I just saw #7, so by my reasoning, ogr2osm should not be held responsible for duplicate geometry that came from the data source and not introduced as a result of conversion.

That said, thanks for taking over the maintenance of this project from @pnorman

roelderickx commented 3 years ago

There is not always a one on one mapping between the source file format and an osm file, it is not as easy as just outputting what can be found in the input file. The goal is to output valid osm files, which requires merging duplicate geometries, but the amount of forks from pnorman's version disabling the mergePoints function clearly demonstrates this not not always required or desirable.

If I understand correctly you are not necessarily opposing the merging of duplicate geometries but you would rather have all tag values separated in the output file, even if the key is the same? e.g. <tag k="ref_id" v="23243"><tag k="ref_id" v="23443255"> in stead of <tag k="ref_id" v="23243,23443255"> This is currently not possible but it can easily be implemented, I am not opposing a parameter to achieve this behaviour.

prusswan commented 3 years ago

The goal is to output valid osm files, which requires merging duplicate geometries

Not sure if this is entirely accurate? Shapefiles can have duplicate geometries, this is not something limited by file/data format.

If I understand correctly you are not necessarily opposing the merging of duplicate geometries but you would rather have all tag values separated in the output file, even if the key is the same? e.g. in stead of

For this case I need the tags to be separated (duplicate geometry or not) as they belong to different entities. Information would be lost if they are forced into a single entity. The number of ways should be equal to number of polylines in the input (unless they are full duplicates and identical on every field). In the original version, it is able to retain identical ways (sharing the same geometry), merging points is okay as they do not define the entities (polylines) in the original input.

roelderickx commented 3 years ago

This can be achieved as well with the following translation:

import ogr2osm

class NoMergeTranslation(ogr2osm.TranslationBase):
    def merge_tags(self, geometry_type, tags_existing_geometry, tags_new_geometry):
        if geometry_type == 'node':
            return super().merge_tags(geometry_type, tags_existing_geometry, tags_new_geometry)
        else:
            return None