marians / openweather

Rudimentary python client for OpenWeatherMap.org
9 stars 4 forks source link

Python Pandas DataFrame output #9

Open scls19fr opened 9 years ago

scls19fr commented 9 years ago

Hello,

It will be nice if get_historic_weather could output Pandas DataFrame:

Here is a very basic code

import pandas as pd

data = ow.get_historic_weather(station_id, start_date, end_date)
df = pd.DataFrame(data)
df['dt'] = pd.to_datetime(df['dt'], unit='s') # convert unix timestamp to datetime
df = df.set_index('dt') # dt is now index of DataFrame

So plotting temperature is now very easy:

import matplotlib.pyplot as plt
df['temp'].map(lambda d: d['ma'] - 273.15).plot()
plt.show()

But that not enough because ideally each column of this dataframe should also be split according dict keys.

Moreover using Pandas Daframe it will be very easy to output CSV or anything else (database table, HDF5, Excel file...) see http://pandas.pydata.org/pandas-docs/dev/io.html

Kind regards

scls19fr commented 9 years ago

Flattening dict is better:

import pandas as pd
import numpy as np

data = ow.get_historic_weather(station_id, start_date, end_date)
#data = [flatten_dict(d) for d in data]
a_data = np.array(data)
f_flatten_dict = np.vectorize(flatten_dict)
a_data = f_flatten_dict(a_data)
df = pd.DataFrame(list(a_data))
df['dt'] = pd.to_datetime(df['dt'], unit='s')
df = df.set_index('dt')
print(df)
(df['temp_ma']-273.15).plot()
plt.show()
marians commented 9 years ago

Thanks for the suggestion. Since I am not using pandas, I cannot really tell whether it's one or two suggestions you are making.

Here is one general thing though: I wouldn't want to make pandas a requirement for the client. So adapting to pandas should better happen without openweather knowing about pandas.

scls19fr commented 9 years ago

Here is my code

https://github.com/scls19fr/openweathermap_requests

I've made requests, requests-cache (for cache) and pandas requirements (and also click for CLI arguments parsing)

Fetching history weather data is done by chunk.

Flattening JSON data is now done using json_normalize. see https://github.com/pydata/pandas/issues/9131

Still having some issue with Excel output see https://github.com/pydata/pandas/issues/9139#issuecomment-68031367 but it should be fixed soon CSV output is ok

I'm using this with my other project: https://github.com/scls19fr/pandas_degreedays