ngageoint / hootenanny

Hootenanny conflates multiple maps into a single seamless map.
GNU General Public License v3.0
353 stars 74 forks source link

Add a many to many fuzzy match for translations #1859

Open jasonsurratt opened 6 years ago

jasonsurratt commented 6 years ago

Use case: You want to match multiple tags to multiple tags in the translation file in a simple and repeatable way. This should automatically generate similar rules to reduce maintenance and make the translations more robust.

For example:

The following should translate:

These two should be translated in both directions in the translation file and should produce the following rules (pseudo code):

Custom to OSM:

OSM to Custom:

This last rule may be a little controversial since we're adding a Fitness Center comment on something that is a fitness station. However, making the rule more generic will avoid hard coding a large number of rules and it will still translate in a reasonable (if not perfect) way. Also, if this is problematic, another leisure=fitness_station rules could be created to eliminate this problem.

Implementation:

Create a new many to many rule concept in SchemaTools.js. This rule type takes an array of OGR tags and an array of OSM fuzzy rules. At this point the operation will always be AND when looking at both OGR and OSM tags. ORs and such could be added, but they add a bit of complexity and aren't necessary at this time.

Create two new functions for converting the many to many rules into tables. There should be a table concept for OGR to OSM and OSM to OGR. In the case of OSM to OGR, 1 or more entries may be generated based on the fuzzy rules. The tables will take the form:

OSM to OGR:

Rule Table:

rules[key1][value1][key2][value2]<...> = [{key:<key1>,value:<value1>,type:<simple|addToList>},...]

In the above rules, the keys will be in sorted order (e.g. key1 < key2 < key3 ...). It will be illegal (and throw an error) to have overlapping rules. E.g. you can't have both of these rules:

If a column is being used for containment, containment will be used in all instances and an error will be thrown if containment is not used. Another table will be generated that contains the list of columns that use containment.

Using this slightly unusual structure should make lookups quite fast by iterating over the keys in sorted order. The containment operations will require iterating over each value for a given key, but I don't anticipate that being a long list at this time. If containment becomes an issue, we can change the sorted order.

OGR to OSM:

The OGR to OSM rules will be identical to the OSM to OGR rules, but in reverse.

jasonsurratt commented 6 years ago

@mattjdnv FYI

jasonsurratt commented 6 years ago

Got the OSM to OGR side generating tables. Shouldn't be too much more work to apply the tables. An examples:

Rules:

    {ogr:["commercial$TYPE1=Service", 
          "commercial$TYPE2=Other",
          "commercial$COMMENTS<=Fitness Center"], 
     osm:[isSimilar('leisure=fitness_centre', 0.8, 0, 1)]},

    {ogr:["commercial$TYPE1=Service",
          "commercial$TYPE2=Other",
          "commercial$COMMENTS<=Sauna"], 
     osm:[isSimilar('leisure=sauna', 0.8, 0, 1)]},

Generates the lookup table:

{
 "amenity": {
  "fitness_center": {
   "_SCORE_": 0.7499999999999997,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Fitness Center"
   ]
  },
  "fitness_centre": {
   "_SCORE_": 0.7499999999999997,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Fitness Center"
   ]
  }
 },
 "leisure": {
  "fitness_center": {
   "_SCORE_": 1,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Fitness Center"
   ]
  },
  "fitness-center": {
   "_SCORE_": 1,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Fitness Center"
   ]
  },
  "fitness-centre": {
   "_SCORE_": 1,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Fitness Center"
   ]
  },
  "fitness_centre": {
   "_SCORE_": 1,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Fitness Center"
   ]
  },
  "fitness-station": {
   "_SCORE_": 0.27499999999999974,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Fitness Center"
   ]
  },
  "fitness_station": {
   "_SCORE_": 0.27499999999999974,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Fitness Center"
   ]
  },
  "sauna": {
   "_SCORE_": 1,
   "_OGR_": [
    "commercial$TYPE1=Service",
    "commercial$TYPE2=Other",
    "commercial$COMMENTS<=Sauna"
   ]
  }
 }
}