mroberge / hydrofunctions

A suite of convenience functions for working with hydrology data in an interactive Python session.
MIT License
62 stars 27 forks source link

extract_nwis_df() function returns a tuple with dataframe and dictionary #112

Closed nlamkey closed 1 year ago

nlamkey commented 2 years ago

Description

I wrote some code for hydrofunctions a year ago that worked in getting some processed dataframes with hydrofunctions. I ran it again today and found that the code no longer works. The problem lies in the extract_nwis_df function. It used to return just a dataframe but now it returns a tuple with a df and a dictionary. In one instance it also returned 4 more columns than I called. This might of been a separate issue. I found a work around by using this subsettting the tuple with [0]. Is there a more elegant way to fix this workflow?

What I Did

def create_df(site, start, end):
    # YOUR CODE HERE
    """Creates a Panadas DataFrame with data
    downloaded from NWIS using hydrofucntions.
    Renames columns containing discharge and
    qualification codes informaiton to "discharge" and
    "flag", respectively. Creates a "siteName", "latitude",
    and "longitude" columns. Outputs the new dataframe.

    Parameters
    ----------
    site : str
    The stream gauge site number.

    start : str
    The start date as (YYYY-MM-DD) of time period of interest.

    end : str
    The end date as (YYYY-MM-DD) of time period of interest.

    Returns
    -------
    discharge : Pandas DataFrame
    Returns a dataframe containing date discharge, qualification
    codes, site name, and latitdue and longitude data

    """

    # Response from site
    parameterCd = ["00065", "00060"]
    resp = hf.get_nwis(site, "dv", start, end).json()

    # Extract values to a pandas dataframe
    discharge = hf.extract_nwis_df(resp)

    # Rename columns
    discharge.columns = ["discharge", "flag", 'stage', 'flag']

    # Create sitename column
    site_name = hf.get_nwis_property(resp, key="siteName")[0]

    discharge['siteName'] = site_name

    # Create lat and long column
    geoloc = hf.get_nwis_property(resp, key="geoLocation")[0]["geogLocation"]
    lat = geoloc["latitude"]
    long = geoloc["longitude"]
    discharge["latitude"] = lat
    discharge["longitude"] = long
    return discharge

site = ["06479215","06479438","06479500","06479525","06479770","06480000"]
start = "2018-01-01"
end = "2020-12-01"
temp_list = []

for i in site:
    df = create_df(i, start, end)
    temp_list.append(df)

stream_gage_df = pd.concat(temp_list)
stream_gage_df

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\NICK~1.LAM\AppData\Local\Temp/ipykernel_20488/4087259266.py in <module>
      5 
      6 for i in site:
----> 7     df = create_df(i, start, end)
      8     temp_list.append(df)
      9 

C:\Users\NICK~1.LAM\AppData\Local\Temp/ipykernel_20488/3569674067.py in create_df(site, start, end)
     36 
     37     # Rename columns
---> 38     discharge.columns = ["discharge", "flag", 'stage', 'flag']
     39 
     40     # Create sitename column

AttributeError: 'tuple' object has no attribute 'columns'
mroberge commented 2 years ago

Hi Nick! Thanks for this question and for using hydrofunctions. I changed this function a few versions ago- I've been encouraging people to use the hf.NWIS interface instead. I can send you some code suggestions in the morning.

As you said, the tuple contains a data frame and a dictionary. The dictionary contains some metadata, but I don't remember if it has lat &long. One way to access just the dataframe is to do this:

discharge, meta = hf.extract_nwis_df(resp)

This line would replace the line where you extract values to a dataframe. You could use meta if you want or just ignore it. The rest of your code should work as is. I'll try it out in the morning!

mroberge commented 2 years ago

I see what you are trying to do here! It looks like you want to create a dataframe that is in the long format similar to R's 'tidy' format. I've been wanting to provide this functionality for my NWIS class for a while.

First, my code above works for your example.

Second, you said that sometimes you get two columns instead of four columns. This is because sometimes when you request data from a site it is only returning stage data instead of stage and discharge. I've never seen that before, so I'm curious. But this can be fixed by creating a more robust system for renaming your columns. Right now you just assume that you have four columns and you give them names. Instead, you could use the 'rename' method of dataframes to change the column names and create a mapper function. It would work like this: my_df.rename(mapper_function, axis=columns) now you just need a mapper function that takes the column string, looks to see if it is for qualifiers or data, and looks to see if it is for stage or discharge and return something appropriate.

Third, right now your function gives the same name to two different columns. Until we come up with a better renaming function, I would replace that line with something like this:

discharge.columns = ["discharge", "discharge-flag", 'stage', 'stage-flag']
nlamkey commented 2 years ago

Thanks for taking the time to look at this! My function works great now. I didn't think of doing a mapper function that's a great idea.

I looked into the extra columns again, it looks like one of the sites I left out of the post was also retrieving the nitrite-nitrate data that that station also collects without being called so that through a wrench in the function so I took that station out and it separately.

At any rate I accomplished what I needed to thanks for your help.

On Wed, Jan 12, 2022 at 11:10 AM Martin Roberge @.***> wrote:

I see what you are trying to do here! It looks like you want to create a dataframe that is in the long format similar to R's 'tidy' format. I've been wanting to provide this functionality for my NWIS class for a while.

First, my code above works for your example.

Second, you said that sometimes you get two columns instead of four columns. This is because sometimes when you request data from a site it is only returning stage data instead of stage and discharge. I've never seen that before, so I'm curious. But this can be fixed by creating a more robust system for renaming your columns. Right now you just assume that you have four columns and you give them names. Instead, you could use the 'rename' method of dataframes to change the column names and create a mapper function. It would work like this: my_df.rename(mapper_function, axis=columns) now you just need a mapper function that takes the column string, looks to see if it is for qualifiers or data, and looks to see if it is for stage or discharge and return something appropriate.

Third, right now your function gives the same name to two different columns. Until we come up with a better renaming function, I would replace that line with something like this:

discharge.columns = ["discharge", "discharge-flag", 'stage', 'stage-flag']

— Reply to this email directly, view it on GitHub https://github.com/mroberge/hydrofunctions/issues/112#issuecomment-1011268565, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALP5QFTQCJMNKI7QEAH6VE3UVWYYTANCNFSM5LXF2CMA . You are receiving this because you authored the thread.Message ID: @.***>

mroberge commented 1 year ago

Glad to help!