Open philippdzm opened 1 year ago
Hi @philippdzm, thanks for the kind words!
That is currently not possible directly in Bright Sky and probably won't be added in the near future as it would dramatically increase the size of our database and require a lot of deliberation on how to absorb non-hourly data into our current data structure and how (whether) to mix in the parameters that are only available hourly.
However, a lot of this hassle could be mitigated by adding a separate endpoint (and database table) for the 10-minute data, and only storing the past e.g. 7 days. Would that work for your use case?
Here are two alternatives that might work depending on your use case without having to wait for me to implement a new endpoint:
The new radar endpoint (#144) provides hyperlocal (1 km² grid cell size) precipitation data in 5-minute intervals, including 5-minute forecasts for the next two hours. Selecting by lat/lon is currently not possible (but will come soon), for now you'll have to find the nearest pixel to your station in this giant array and provide a corresponding bbox
.
Bright Sky's parsing core lives in the dwdparse
package, and you can subclass its parsers for the ten-minute-data, e.g. for precipitation data files:
import datetime
import re
from dwdparse.parsers import ObservationsParser
FILENAME = '10minutenwerte_nieder_01766_akt.zip'
class TenMinutePrecipitationParser(ObservationsParser):
elements = {
'precipitation_10': 'RWS_10',
}
def parse_station_id(self, zf, **extra):
for filename in zf.namelist():
if (m := re.search(r'_(\d+)\.txt', filename)):
return m.group(1)
def parse_lat_lon_history(self, zf, dwd_station_id, **extra):
"""Not available in 10-minute-files"""
return {}
def parse_reader(self, filename, reader, lat_lon_history):
for row in reader:
timestamp = datetime.datetime.strptime(
row['MESS_DATUM'],
'%Y%m%d%H%M',
).replace(
tzinfo=datetime.timezone.utc,
)
yield {
'source': f'Observations:Recent:{filename}',
'timestamp': timestamp,
**self.parse_elements(row, None, None, None),
}
p = TenMinutePrecipitationParser()
for record in p.parse(FILENAME):
print(record)
Would any of these work for you?
(Previous discussion: https://github.com/jdemaeyer/brightsky/issues/132)
Hi @jdemaeyer
it would dramatically increase the size of our database
I agree, it should be optional if it gets added.
However, a lot of this hassle could be mitigated by adding a separate endpoint (and database table) for the 10-minute data, and only storing the past e.g. 7 days. Would that work for your use case?
yes, for my use case, this would be perfect. 7 days is enough (could become configurable).
Side-notes to this:
- Manually perform parsing in Python
Thank you for the provided code. Nicely done subclassing for the 10 minutes data! I'll look at it next week.
For now, I was able to rig up a solution which avoids downloading the file but treat it right a way and extract the data I need (air temperature):
def fetch_data(url_to_file):
# fetch
r = requests.get(url_to_file)
# Create a BytesIO object from the request's content
z = zipfile.ZipFile(BytesIO(r.content))
# Assuming there's a single txt file
txt_file_name = z.namelist()[0]
txt_file_content = StringIO(z.open(txt_file_name).read().decode('utf-8'))
# Parse the text file using pandas
data = pandas.read_csv(txt_file_content, sep=';')
data.MESS_DATUM = pandas.to_datetime(data.MESS_DATUM, format='%Y%m%d%H%M', utc=True)
data = data.set_index('MESS_DATUM')
# return air temperature only
return data.TT_10
Hi, thanks for the great work.
As the DWD provides 10 minutes data, I was wondering if the brightsky API can be told to return 10 minutes data if available?
E.g. The response to
<domain>/weather?date=2023-05-14&last_date=2023-05-15&tz=UTC&units=dwd&dwd_station_id=<id-1>,<id-2>
returns hourly data. Concrete question: Can it be 10-minutes-data?