reteps / redfin

A Python wrapper around redfin's unofficial API.
MIT License
106 stars 40 forks source link

Stingray documentation #16

Open chaberkern opened 1 year ago

chaberkern commented 1 year ago

Not an issue with the package, but I'm trying to track down the documentation for the Stingray API (like the kinds of requests I can make with it, and how to structure them) and there's basically nothing online.

As an example, I found a stackoverflow question regarding stingray's API where one can download a CSV of searches:

https://stackoverflow.com/questions/73200792/convert-redfins-region-id-to-a-zip-code

This request doesn't seem to be part of this package, which is totally fine, but it'd be cool to understand how to access it.

In short, what I am trying to do is a large pull of MLS data (specifically characteristics of a property, such as bed/bath, exterior condition - all stuff in the below_the_fold part of the package) and I am curious if there are easier ways to grab that information than my current, non-scalable solution which is to iterate through each of the properties I am interested in.

Ideally Redfin would just let me export this data, but I don't think they want to do that even if it is publicly available via webscraping and APIs!

bnberns commented 1 year ago

Hi - did you ever find a solution for accessing the MLS data? I am working on a project where I need to pull similar MLS data. Thanks in advance

chaberkern commented 1 year ago

Hi @bnberns. Here is a snippet from my code. It's a little messy, but works. You can ignore opa_parcel_number as that is a unique id that the Philadelphia government assigns to properties. You only need the searched address below.

The big issue with all this is that I had to manually read through payloads to make sure I was bringing in all the correct parts of the returned json. But, this seems to work.

def get_mls_data(
     address
    , opa_parcel_number
    , amenity_df_list
    , error_list
    ):
    """Gets MLS Data from Redfin"""
    try:
        ua = UserAgent()
        client = Redfin()
        client.user_agent_header = {'user-agent':ua.chrome}
        response = client.search(address)
        url = response['payload']['exactMatch']['url']
        initial_info = client.initial_info(url)
        property_id = initial_info['payload']['propertyId']
        mls_data = client.below_the_fold(property_id)

        address_info = mls_data['payload']['amenitiesInfo']['addressInfo']
        street_returned = address_info['street'].upper()
        address_returned = address_info['street'] + ' ' + address_info['city'] + ' ' + address_info['state']
        address_returned = address_returned.upper()

        #List to build amenity dataframe off of
        amenity_list = []
        #Create super groups
        super_groups = mls_data['payload']['amenitiesInfo']['superGroups']
        for super_group in range(len(super_groups)):
            #Create amenity groups
            amenity_groups = super_groups[super_group]['amenityGroups']
            for amenity_group in range(len(amenity_groups)):
                #Collect group names
                group_title = amenity_groups[amenity_group]['groupTitle']
                group_reference_name = amenity_groups[amenity_group]['referenceName']
                entries = amenity_groups[amenity_group]['amenityEntries']

                #Finally, capture the amenity itself
                for entry in range(len(entries)):
                    amenity = entries[entry]
                    amenity['groupTitle'] = group_title
                    amenity['groupReferenceName'] = group_reference_name
                    amenity_list.append(amenity)

        amenity_df = pd.DataFrame(amenity_list)
        #Unpack amenity value list
        for idx in range(len(amenity_df)):
            amenity_df.at[idx,'amenityValues'] = amenity_df['amenityValues'].values[idx][0].strip()
        #rename columns because camel case is bad
        rename_dict = {
            'amenityName':'amenity_name'
            , 'referenceName':'reference_name'
            , 'accessLevel':'access_level'
            , 'displayLevel':'display_level'
            , 'amenityValues':'amenity_values'
            , 'groupTitle':'group_title'
            , 'groupReferenceName':'group_reference_name'
        }
        amenity_df.rename(columns=rename_dict,inplace=True)
        amenity_df['redfin_property_id'] = property_id
        amenity_df['opa_parcel_number'] = opa_parcel_number
        amenity_df['address_searched'] = address
        amenity_df['address_returned'] = address_returned
        amenity_df['street_returned'] = street_returned
        amenity_df_list.append(amenity_df)

I have a wonky workflow for this where I use multithreading to pull in the data. I'd refashion this for your own work:

amenity_df_list = []
error_list = []

for i in tqdm(range(0,len(rsa_df)+1000,1000)):
    threads = list()
    thread_df = rsa_df.iloc[i : i+1000]
    for index in thread_df.index:
        # logging.info("Main    : create and start thread %d.", index)
        x = threading.Thread(target=get_mls_data, args=([thread_df['redfin_address'][index],thread_df['parcel_number'][index],amenity_df_list,error_list]),daemon=True)
        threads.append(x)
        x.start()
    for index, thread in tqdm(enumerate(threads)):
        thread.join()

Then you'll want to pivot the data:

try:
    amenity_df = pd.concat(amenity_df_list)
    amenity_df['run_datetime'] = run_datetime
    amenity_pivot = amenity_df.pivot(
        index=['opa_parcel_number','address_searched','address_returned','street_returned']
        , columns='reference_name'
        , values='amenity_values').reset_index()
    amenity_pivot['run_datetime'] = run_datetime
    amenity_pivot.columns = amenity_pivot.columns.str.lower()
except Exception as e:
    print(e,e.args)
Sargeanthost commented 10 months ago

The current belowthefold method doesn't capture all of the data. If the property has a listing ID, you should pass it as a kwarg. If you want something that downloads all houses that match some filters, look into the gis-csv? link that is under the href tag for the "Download All" link at the bottom of most search pages.

Also, checkout RedfinPlus on GitHub for further documentation (its not the best but its there)