request-yo-racks / api

A REST API for the Request-Yo-Racks projects.
https://api.requestyoracks.org
MIT License
0 stars 4 forks source link

Validate retrieve and merge algorithms #66

Open rgreinho opened 6 years ago

rgreinho commented 6 years ago

Issue Type

Current Behavior

The retrieve algorithm perform the following steps:

  1. Query google to retrieve the summary information of a place which contains:
    • The Google ID of the place
    • The name of the place
    • The address of the place
    • For instance: 'krzzyozIVGC7pX1lfVO40w', 'Epoch Coffee', '221 West North Loop Boulevard, Austin',
      • This is considered the source of truth and the base for ALL the following searches!
  2. Using the Google ID, retrieve the detailed information of the place.
  3. Using the name and the address, retrieve the detailed information using all the other collectors. The information is obtained by performing an exact search and retrieving the first match

The merge algorithm uses a weight that is assigned to the collector. The results retrieved by step 3 of the previous operation are a set of properties defining a business. These properties are compared one by one, the property with the lightest (i.e. lowest) weight will be pick up first. The heaviest results will fall at the bottom of the pile and will be used last, only if no other result was picked up before.

Expected Behavior

The research algorithm has one flaw: it assumes that the exact match is 100% accurate. But if the search returns the information of Taco Deli instead of Epoch Coffee Shop, 1) we have no way to know 2) the merge operation will complete using incorrect data.

The merge algorithm uses weights that were assigned with our gut feelings. We need a way to ensure they are correct.

Possible Solution

For the search algorithm, we need to add some validation of the results. For instance, the name has to match the value from summary information, otherwise the result is discarded.

For the merge algorithm we need a routine which will extract the information of 50 locations using each collector, merge them and store the result into a file (CSV for instance) that could be validated by a human.