jrsmith3 / minimum_sugar

0 stars 1 forks source link

`fetch_menu_item_data` returns duplicates #20

Closed jrsmith3 closed 8 years ago

jrsmith3 commented 8 years ago

The following code will yield duplicate menu items.

# Assume `credentials` is a dictionary holding Nutritionix API credentials.
import minimum_sugar
import collections

# ID value 513fbc1283aa2dc80c000053 corresponds to McDonald's
menu_items = minimum_sugar.fetch_menu_item_data("513fbc1283aa2dc80c000053", credentials)
item_ids = [menu_item["item_id"] for menu_item in menu_items]

dups = [item for item, count in collections.Counter(item_ids).items() if count > 1]

print len(item_ids)
print len(item_ids) - len(dups)
print len(dups)

# Returns
#359
#347
#12
jrsmith3 commented 8 years ago

Here's some code that exposes the duplicate menu items:

indices = []
for indx, val in enumerate(item_ids):
    if val in dups:
        indices.append(indx)

print indices

# Returns
# [88,
#  89,
#  90,
#  91,
#  92,
#  93,
#  94,
#  95,
#  96,
#  97,
#  98,
#  99,
#  105,
#  106,
#  107,
#  108,
#  109,
#  110,
#  111,
#  112,
#  113,
#  114,
#  115,
#  116]

# Show corresponding `item_ids`
print [item_ids[indx] for indx in indices]

# Returns
# [u'513fc9e73fe3ffd4030010f3',
#  u'513fc9e73fe3ffd4030010f8',
#  u'513fc9e73fe3ffd4030010fe',
#  u'513fc9e73fe3ffd403001101',
#  u'513fc9e73fe3ffd403001106',
#  u'513fc9e73fe3ffd40300110c',
#  u'513fc9e73fe3ffd403001113',
#  u'513fc9e73fe3ffd403001118',
#  u'513fc9e73fe3ffd40300111e',
#  u'513fc9e73fe3ffd403001120',
#  u'513fc9e73fe3ffd403001125',
#  u'513fc9e73fe3ffd40300112b',
#  u'513fc9e73fe3ffd4030010f3',
#  u'513fc9e73fe3ffd4030010f8',
#  u'513fc9e73fe3ffd4030010fe',
#  u'513fc9e73fe3ffd403001101',
#  u'513fc9e73fe3ffd403001106',
#  u'513fc9e73fe3ffd40300110c',
#  u'513fc9e73fe3ffd403001113',
#  u'513fc9e73fe3ffd403001118',
#  u'513fc9e73fe3ffd40300111e',
#  u'513fc9e73fe3ffd403001120',
#  u'513fc9e73fe3ffd403001125',
#  u'513fc9e73fe3ffd40300112b']
jrsmith3 commented 8 years ago

I don't understand why the overlap occurs where it does. What is special about items 88 to 99, particularly because the iteration seems to happen in units of 50?

jrsmith3 commented 8 years ago

According to the result from the Nutritionix API, the Nutritionix database has 359 records for McDonald's:

restaurant_id = "513fbc1283aa2dc80c000053"

dat = minimum_sugar.fetch_subset_menu_item_data(restaurant_id, credentials)
print dat["total"]

# Returns
# 359

The thing is, menu_items also has a length of 359.

jrsmith3 commented 8 years ago

I conclude that the problem is on Nutritionix's end: the API is sending me duplicates. Consider:

# Fetch items 100 through 150.
dat = minimum_sugar.fetch_subset_menu_item_data(restaurant_id, credentials, offset=100)

# Navigate to the relevant part of the data structure:
dat["hits"][5]["fields"]["item_id"]

# Returns
# u'513fc9e73fe3ffd4030010f3'

The return value is identical to the item_id corresponding to 105 in the item_ids list above.

Solution

The minimum_sugar.fetch_menu_item_data function is doing exactly what I want. I need to handle duplicates as I attempt to enter them into the SQLite database.