Closed prusswan closed 3 years ago
Merging duplicate geometries is required to create valid osm files, but there are plenty of use-cases where merging is undesirable. In these situations a custom translation file can be written to disable merging, for example:
import ogr2osm
class NoMergeTranslation(ogr2osm.TranslationBase):
def merge_tags(self, geometry_type, tags_existing_geometry, tags_new_geometry):
return None
You probably want more control than just disable all merging, but of course more complicated logic is also possible. I tried my best to think of all possible scenarios while developing merging, please let me know if the current solution doesn't entirely fit your needs. More details about your specific use-case would be appreciated in this case.
My use case requires importing osm data (converted from another format) into the osmosis pgsnapshot schema for further corrections and processing. Under this schema, duplicate geometry is allowed (but fixing is okay provided I know which entry to retain), while the non-geometry attributes would become mangled as a result of automerge (it just makes a bigger mess since it ends up mixing the reference keys needed to trace the original inputs, e.g. k="ref_id", v="23243,23443255", would rather it just reports the ref_ids in question). I do consider ogr2osm as a converter, so data correction may be beyond what it is meant to do. I just saw #7, so by my reasoning, ogr2osm should not be held responsible for duplicate geometry that came from the data source and not introduced as a result of conversion.
That said, thanks for taking over the maintenance of this project from @pnorman
There is not always a one on one mapping between the source file format and an osm file, it is not as easy as just outputting what can be found in the input file. The goal is to output valid osm files, which requires merging duplicate geometries, but the amount of forks from pnorman's version disabling the mergePoints function clearly demonstrates this not not always required or desirable.
If I understand correctly you are not necessarily opposing the merging of duplicate geometries but you would rather have all tag values separated in the output file, even if the key is the same? e.g. <tag k="ref_id" v="23243"><tag k="ref_id" v="23443255">
in stead of <tag k="ref_id" v="23243,23443255">
This is currently not possible but it can easily be implemented, I am not opposing a parameter to achieve this behaviour.
The goal is to output valid osm files, which requires merging duplicate geometries
Not sure if this is entirely accurate? Shapefiles can have duplicate geometries, this is not something limited by file/data format.
If I understand correctly you are not necessarily opposing the merging of duplicate geometries but you would rather have all tag values separated in the output file, even if the key is the same? e.g.
in stead of
For this case I need the tags to be separated (duplicate geometry or not) as they belong to different entities. Information would be lost if they are forced into a single entity. The number of ways should be equal to number of polylines in the input (unless they are full duplicates and identical on every field). In the original version, it is able to retain identical ways (sharing the same geometry), merging points is okay as they do not define the entities (polylines) in the original input.
This can be achieved as well with the following translation:
import ogr2osm
class NoMergeTranslation(ogr2osm.TranslationBase):
def merge_tags(self, geometry_type, tags_existing_geometry, tags_new_geometry):
if geometry_type == 'node':
return super().merge_tags(geometry_type, tags_existing_geometry, tags_new_geometry)
else:
return None
Duplicate geometry should be reported, but the automerging of tags has led to unexpected results. I would prefer to have control over the merge resolution of the original inputs, rather than dealing with automerged result.