salish-sea / acartia-data

MIT License
0 stars 0 forks source link

Acartia API -> Pandas bug #1

Open veirs opened 1 year ago

veirs commented 1 year ago

Here is some code that fails.

Error: `TypeError: unhashable type: 'dict'`

Why does this happen?

import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'
import numpy as np
import json
import requests
response = requests.get('https://acartia.io/api/v1/sightings/current')
acartia_response = json.loads(response.text)
acartia_response
acartia_df = pd.DataFrame(acartia_response)
print(acartia_df.shape)
df = acartia_df.drop_duplicates(inplace=True)  #THIS IS THE PROBLEM
print(acartia_df.shape)
print(df.shape)
scottveirs commented 1 year ago

Let me add Christian to the organization and then we can discuss here...

cpsarason commented 1 year ago

I'll take a look. Usually this kind of error is associated with trying to apply some method on the dataframe that runs into a type error (in this case, perhaps a dict object getting passed to something expecting a float?) I need to play around with it to find out.

cpsarason commented 1 year ago

Hi folks.

This issue is less a pandas bug and more a bug in how we're parsing it. The "TypeError" was my first clue --- as I mention in the comment above. I was able to get it to work by dropping 2 columns, and also, for completeness renaming "type" to "whale_type" (type is a command associated with pandas/python, so best to to be explicit. have the same problems working with "floats" in my previous gig!)

In any case, try this, Val: import pandas as pd pd.options.mode.chained_assignment = None # default='warn' import numpy as np import json import requests df = [] response = [] acartia_response = [] response = requests.get('https://acartia.io/api/v1/sightings/current') j = response.json() df = pd.DataFrame.from_dict(j) print(df.shape) ## shape of dataframe here is (232, 18) df['whale_type'] = df['type'] df.drop(columns=['type', 'profile','signature'],inplace=True) print(df.shape) ## shape of dataframe here is (232, 16) df = df.drop_duplicates() print(df.shape) ## shape of dataframe here is (232, 16)

Interestingly, the shape of the data frame changes after dropping those columns, so a different possible issue is a couple of extra rows containing problematic "extra" row dat in the profile, signature or type columns?

Will keep investigating, but I think this should get you moving again.