itsahsiao / breadcrumbs

A full-stack Flask web app that lets foodies search restaurants, track their eating history, while also connecting with friends
28 stars 10 forks source link

Create a function that gets all restaurant results from Yelp #18

Closed itsahsiao closed 8 years ago

itsahsiao commented 8 years ago

Yelp only returns 20 results when you make an API request. For my project, I want to get all restaurants for a city, but when I make an API call to Yelp, I only get the first 20 restaurants.

I need to create a function that gets the next 20 restaurants and the next 20 restaurants after that, until it returns all restaurant results for a city. I need to offset the results each time for this.

This way, I have an initial dataset for Issue #3 , as up to this point, I have only been testing my project with just a small dataset of 20 restaurants, but it would be nice to have a full dataset. This also ensures that I am able to get all the data that I need and that my project can be extended to other cities, not just one city.

itsahsiao commented 8 years ago

Original code when I first made an API call to Yelp to get 20 restaurants to test and play around with for my database and project:

def load_restaurants(city):
    """Load restaurants from Yelp API into database."""

    print "Restaurants"

    # Delete all rows in table, so if we need to run this a second time,
    # we won't be trying to add duplicate users
    Restaurant.query.delete()

    # Read Yelp API keys
    with io.open('config_secret.json') as cred:
        creds = json.load(cred)
        auth = Oauth1Authenticator(**creds)
        client = Client(auth)

    ## TO-DO:
    # Limit / offset to get all results for a restaurant
    # Separate seed.py from help function

    # Set search parameters for Yelp API request
    # Limit API request to 20 results first
    # Keep database small, until something working to make another API request
    params = {
        'term': 'food',
        'limit': 20,
    }

    # Make Yelp API request and store response
    response = client.search(city, **params)

    # Check to see if city exists in database to get the city id
    # If not, add city into database and get city it
    if db.session.query(City.city_id).filter(City.name == city).first():
        city_id = db.session.query(City.city_id).filter(City.name == city).first()
        city_id = city_id[0]
    else:
        new_city = City(name=city)
        db.session.add(new_city)
        db.session.commit()
        city_id = new_city.city_id

    # API response returns a SearchResponse object
    # Specify information by looking at its attributes and indexing
    # response.businesses returns a list of business objects with further attributes
    for business in response.businesses:
        restaurant = Restaurant(city_id=city_id,
                                name=business.name,
                                address=" ".join(business.location.display_address),
                                phone=business.display_phone,
                                image_url=business.image_url,
                                latitude=business.location.coordinate.latitude,
                                longitude=business.location.coordinate.longitude)

        # Add to the session to store into the db
        db.session.add(restaurant)

        # Commit to save changes
        db.session.commit()
itsahsiao commented 8 years ago

I noticed that with 'term': food in the search parameters, I would get results back for food trucks, which may not have an exact address although coordinates were provided. This ended up with the food truck being plotted on the map, but not being an accurate portrayal of where the food truck actually is (plus food trucks would not just stay in one spot / have a permanent location).

I could exclude places that did not have a street address, or I could change my search parameters to be restaurant only, as restaurants would have a permanent location.

When I tried an API call to Yelp changing the parameter, this changed the number of results from ~7000 for Sunnyvale down to ~4000, as cafes or other food related places may have also got excluded. As my project is for tracking a user's restaurant history, I will keep the term as restaurant only for now.

Keep an eye to see if coordinates are not provided or proper street addresses and may need to work on the data design portion for my dataset further.

itsahsiao commented 8 years ago

Next, I wanted to refactor my function, and I noticed that the part where I am checking for the city id in my database, this could be in its own function:

    # Check to see if city exists in database to get the city id
    # If not, add city into database and get city it
    if db.session.query(City.city_id).filter(City.name == city).first():
        city_id = db.session.query(City.city_id).filter(City.name == city).first()
        city_id = city_id[0]
    else:
        new_city = City(name=city)
        db.session.add(new_city)
        db.session.commit()
        city_id = new_city.city_id

I also noticed that I should not be using .first(), similar to what I noticed previously for my code in server.py when I was checking if a user exists in my database at the /login route. Since a city would be a unique record in my database, I should be using .one(), which will raise an exception error if NoResultFound.

I rewrote this to the following function:

def get_city_id(city):
    """Get the city id from database. Otherwise, add city to database and get the city id."""

    # Check if argument (city) passed in is a city that exists in the database
    # If not, instantiate the new city in the database and get the city id
    # Otherwise, return the city id for the existing city from the database
    try:
        # existing_city_id = db.session.query(City.city_id).filter(City.name == city).one()[0]
        existing_city = db.session.query(City).filter(City.name == city).one()
        # TODO: Ask if object better or tuple better???

    except NoResultFound:
        new_city = City(name=city)
        db.session.add(new_city)
        db.session.commit()
        return new_city.city_id

    return existing_city.city_id

STILL NEED CODE REVIEW FOR THIS Also wanted to ask in help queue regarding object vs. tuple

Then inside my function to make an API call, I could just do city_id = get_city_id(city)

itsahsiao commented 8 years ago

Now comes the important part where I need to think of how to fix the code in my function containing the Yelp API call, so that I can offset the results by 20 each time. I know I need a while loop or for loop and an offset, but wasn't sure where to start with my code. Got some ideas from http://www.mfumagalli.com/wp/portfolio/nycbars/

Function had to be separated into two functions: 1) A function that makes the API call to return results which can be offset by an amount. 2) A function that automated this process above of getting the Nth results each time and then loading the results into my database (first function had to be passed into this second function)

itsahsiao commented 8 years ago

For 1) A function that makes the API call to return results which can be offset by an amount, I was able to follow parts of the code from the blog post above for ideas, and I created my own function as follows:

def get_restaurants(city, offset):
    """
    Make API request to Yelp to get restaurants for a city, and offset the results by an amount.

    Note that Yelp only returns 20 results each time, which is why we need to offset if we want
    the next Nth results.
    """

    # Read Yelp API keys
    with io.open('config_secret.json') as cred:
        creds = json.load(cred)
        auth = Oauth1Authenticator(**creds)
        client = Client(auth)

    # Set search parameters for Yelp API request
    # Set term as restaurant to get restaurants back as the results
    # Also pass in offset, so Yelp knows how much to offset by
    params = {
        'term': 'restaurant',
        'offset': offset
    }

    # Make Yelp API call and return the API response
    return client.search(city, **params)

For this function, I would get back the first 20 results if I passed in an offset of 0, i.e. get_restaurants("Sunnyvale", 0) and the next 20 results if I passed in an offset of 20, get_restaurants("Sunnyvale", 20)

Note: Yelp returns an SearchResponse object back, so I had to iterate through by accessing the attributes to see that I was getting a different set of 20 restaurants back each time.

itsahsiao commented 8 years ago

For 2) A function that automated this process above of getting the Nth results each time and then loading the results into my database, I ended up with the following code (needs a code review):

def load_restaurants(city):
    """Get all restaurants for a city from Yelp and load restaurants into database."""

    # Get city id, as city id is a required parameter when adding a restaurant to the database
    city_id = get_city_id(city)

    # Start offset at 0 to return the first 20 results from Yelp API request
    offset = 0
    response = get_restaurants(city, offset)

    # Get total number of restaurants for this city
    total_results = response.total

    # Offset by 20 each time to get all restaurants and load each restaurant into the database
    while offset < total_results:

        for business in response.businesses:
            restaurant = Restaurant(city_id=city_id,
                                    name=business.name,
                                    address=" ".join(business.location.display_address),
                                    phone=business.display_phone,
                                    image_url=business.image_url,
                                    latitude=business.location.coordinate.latitude,
                                    longitude=business.location.coordinate.longitude)

            # Add each restaurant to the db
            db.session.add(restaurant)

        offset += 20

        response = get_restaurants(city, offset)

    # Commit to save changes
    db.session.commit()

But my initial thought process was that I needed to set offset = 0 to get the first 20 results, and use a for loop to iterate through the API response to grab each restaurant and store into database. Then I needed offset = 20 to get the next 20 results.

I also needed to get the total number of results from the API call, which can be accessed using the response attribute .total so I defined a variable and assigned this --> total_results = response.total

Next was trying to figure out how to automate and get the next 20 results, and I used for i in range(0, total_results, 20). This means to start from 0 up until the total number of results, with step 20 (so it goes 0, 20, 40, 60... total_results).

Initial code before the above ended up being:

    city_id = get_city_id(city)

    offset = 0

    response = get_restaurants(city, offset)

    # total_results = response.total

    # Test with 40 instead of total_results first to see if working
    for i in range(0, 40, 20):

        for business in response.businesses:
            restaurant = Restaurant(city_id=city_id,
                                    name=business.name,
                                    address=" ".join(business.location.display_address),
                                    phone=business.display_phone,
                                    image_url=business.image_url,
                                    latitude=business.location.coordinate.latitude,
                                    longitude=business.location.coordinate.longitude)

            # Add to the session to store into the db
            db.session.add(restaurant)

        offset += 20

        response = get_restaurants(city, offset)

        # Commit to save changes
        db.session.commit()

Thought of using a while loop up until total_results than the for i in range and ended up using while offset < total_results Both should work, but only tested with 40 results back, rather than total_results

itsahsiao commented 8 years ago

Got code review on all functions that resulted from refactoring.

Just had to tweak the code for load_restaurants(city) function:

Fixed code for this function to the following:

def load_restaurants(city):
    """Get all restaurants for a city from Yelp and load restaurants into database."""

    # Get city id, as city id is a required parameter when adding a restaurant to the database
    city_id = get_city_id(city)

    # Start offset at 0 to return the first 20 results from Yelp API request
    offset = 0

    # Get total number of restaurants for this city
    total_results = get_restaurants(city, offset).total

    # Get all restaurants for a city and load each restaurant into the database
    # Note: Yelp has a limitation of 1000 for accessible results, so get total results
    # if less than 1000 or get only 1000 results back even if there should be more
    while 1000 > offset < total_results:

        # API response returns a SearchResponse object with accessible attributes
        # response.businesses returns a list of business objects with further attributes
        for business in get_restaurants(city, offset).businesses:
            restaurant = Restaurant(city_id=city_id,
                                    name=business.name,
                                    address=" ".join(business.location.display_address),
                                    phone=business.display_phone,
                                    image_url=business.image_url,
                                    latitude=business.location.coordinate.latitude,
                                    longitude=business.location.coordinate.longitude)

            # Add each restaurant to the db
            db.session.add(restaurant)

        # Yelp returns only 20 results each time, so need to offset by 20 while iterating
        offset += 20

    # Commit to save changes
    db.session.commit()