The retrieve algorithm perform the following steps:
Query google to retrieve the summary information of a place which contains:
The Google ID of the place
The name of the place
The address of the place
For instance: 'krzzyozIVGC7pX1lfVO40w', 'Epoch Coffee', '221 West North Loop Boulevard, Austin',
This is considered the source of truth and the base for ALL the following searches!
Using the Google ID, retrieve the detailed information of the place.
Using the name and the address, retrieve the detailed information using all the other collectors. The information is obtained by performing an exact search and retrieving the first match
The merge algorithm uses a weight that is assigned to the collector. The results retrieved by step 3 of the previous operation are a set of properties defining a business. These properties are compared one by one, the property with the lightest (i.e. lowest) weight will be pick up first. The heaviest results will fall at the bottom of the pile and will be used last, only if no other result was picked up before.
Expected Behavior
The research algorithm has one flaw: it assumes that the exact match is 100% accurate. But if the search returns the information of Taco Deli instead of Epoch Coffee Shop, 1) we have no way to know 2) the merge operation will complete using incorrect data.
The merge algorithm uses weights that were assigned with our gut feelings. We need a way to ensure they are correct.
Possible Solution
For the search algorithm, we need to add some validation of the results. For instance, the name has to match the value from summary information, otherwise the result is discarded.
For the merge algorithm we need a routine which will extract the information of 50 locations using each collector, merge them and store the result into a file (CSV for instance) that could be validated by a human.
Issue Type
Current Behavior
The retrieve algorithm perform the following steps:
'krzzyozIVGC7pX1lfVO40w', 'Epoch Coffee', '221 West North Loop Boulevard, Austin',
The merge algorithm uses a weight that is assigned to the collector. The results retrieved by step 3 of the previous operation are a set of properties defining a business. These properties are compared one by one, the property with the lightest (i.e. lowest) weight will be pick up first. The heaviest results will fall at the bottom of the pile and will be used last, only if no other result was picked up before.
Expected Behavior
The research algorithm has one flaw: it assumes that the exact match is 100% accurate. But if the search returns the information of Taco Deli instead of Epoch Coffee Shop, 1) we have no way to know 2) the merge operation will complete using incorrect data.
The merge algorithm uses weights that were assigned with our gut feelings. We need a way to ensure they are correct.
Possible Solution
For the search algorithm, we need to add some validation of the results. For instance, the name has to match the value from summary information, otherwise the result is discarded.
For the merge algorithm we need a routine which will extract the information of 50 locations using each collector, merge them and store the result into a file (CSV for instance) that could be validated by a human.