microsoft / farmvibes-ai

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
https://microsoft.github.io/farmvibes-ai/
MIT License
686 stars 119 forks source link

DeepMC - Inconsistency in Formatting of Historical Observation Data #181

Closed 321zyx closed 3 months ago

321zyx commented 4 months ago

Topic

Documentation

Ask away!

In the information provided regarding the DeepMC notebook - https://github.com/microsoft/farmvibes-ai/blob/main/notebooks/deepmc/mc_forecast.ipynb - the explanatory text says that the fxx field should be in the format [start, stop & step], which for the example of observations recorded every hour would be "[0, 24, 1]." But the code example below says that the format is "fxx = [frequency_hour, number_of_hours + frequency_hour, 1]." Which, the text below indicates, would be in the same example, would be "fxx: [1, 25, 1] # start, stop, step." Those numbers don't make sense as a start and stop time measured in hours after the start of the day and, even if the first measurement at midnight is given the value 1, the last measurement of the day at 23:00 would be 24, not 25, unless the midnight observation was double counted in both the preceding and following days. I'm not even sure if "frequency_hour" refers to a true frequency (measurements/hour) or is the time in hours between measurements, which is inversely proportional to the frequency, but does identify the frequency. In the example given of one measurement per hour, both of those numbers would be the same.

It makes little sense to me that the field would be "[frequency_hour, number_of_hours + frequency_hour, 1]" as the second number would always be 24 plus the first number and the last is always 1. Only the first value would convey any information. But it doesn't really need to make sense to me. What I need to know is what values to use for data recorded every 5 minutes, 288 times a day. [0, 24, 0.08333] (start time, end time, and time in hours between measurements)? [12, 36, 1] (measurements/hour, number of hours in a day + measurements/hour, 1)? [0, 288, .0833] (time of first measurement of day, number of measurements per day, and time in hours between measurements)? [.08333, 24.08333, 1] (hours between measurements, 24+hours between measurements, 1)? [1, 289, 08333] (first measurement of the day defined as 1, number of last measurement of the day, hours between measurements)? The first and last of those seem most sensible to me, but what do I know?

v-ngangarapu commented 3 months ago

Hi, Thank you for your observations. The fxx parameter defined in the notebook is used as an input to download HRRR data via the Herbie Python package, a third-party tool. The fxx parameter requirements are derived from Herbie. This parameter helps define the start hour, end hour, and frequency within a 24-hour period.

For example, to download data from 7/25/2024 13:00 to 7/26/2024 13:00, fxx would be [13, 13+24, 1]. In the background, this applies a range(13, 13+24, 1), resulting in [13, 14, ..., 36, 37]. Based on my understanding, whenever the fxx value exceeds 24, the Herbie package considers it as downloading data for the next day, starting from 7/26/2024 00:00:00.

Use this url for exploring more on herbie package. https://herbie.readthedocs.io/en/stable/user_guide/tutorial/fast.html

Thanks

321zyx commented 3 months ago

Thanks. I'll look there and see if I can make sense out of it all. I appreciate the help.

rafaspadilha commented 3 months ago

Closing this issue for now. Feel free to reopen if you have any other doubt.