yuhao-nie / Stanford-solar-forecasting-dataset

Stanford sky images and PV power generation dataset for solar forecasting related research and applications
MIT License
122 stars 27 forks source link

Cloudiness Information #7

Open nilsleh opened 12 months ago

nilsleh commented 12 months ago

H @yuhao-nie and @ascott-20, in your paper table 5.1 you evaluate your models separately on cloudy and sunny days. However, this information is not natively included in your dataset to be downloaded. There is information in the preprocessing jupyter notebooks about cloudy and sunny days. However, when I do the following for the forecast task:

sunny_day = [(2017,9,15),(2017,10,6),(2017,10,22),(2018,2,16),(2018,6,12),(2018,6,23),(2019,1,25),(2019,6,23),(2019,7,14),(2019,10,14)]
cloudy_day = [(2017,6,24),(2017,9,20),(2017,10,11),(2018,1,25),(2018,3,9),(2018,10,4),(2019,5,27),(2019,6,28),(2019,8,10),(2019,10,19)]

sunny_datetime = [datetime.datetime(day[0],day[1],day[2]) for day in sunny_day]
cloudy_datetime = [datetime.datetime(day[0],day[1],day[2]) for day in cloudy_day]

arr = np.load("times_test_forecast.npy", allow_pickle=True)
date_arr = [val.date() for val in arr]
sunny_arr = [val.date() for val in sunny_datetime]
cloudy_arr = [val.date() for val in cloudy_datetime]

print(set(date_arr).intersection(set(sunny_arr)))
print(set(date_arr).intersection(set(cloudy_arr)))

The intersection with test forecasting dates and sunny dates is empty, suggesting there are no sunny test dates, only cloudy ones. However, you are reporting values for those in your paper.

For the nowcasting task, the above snippet yields, that the number of cloudy and sunny examples is about equal which is to be expected I guess. Could you help me out what I am missing?

Edit: The times_test_forecast.npy file is generated from running my script in #3

yuhao-nie commented 4 months ago

Hi @nilsleh, I've done a check. I think the reason is due to the find_time_within_nparray function in the notebook, that function assumed that the time array is already sorted (from small to large), but I found the time_test_forecast is actually not sorted while it is combined from sunny and cloudy dates, so that would cause issues when apply to form the forecast data actually. Should solve the problem if you run

sorted_indices = np.argsort(test_times) test_times = test_times[sorted_indices] test_pv = test_pv[sorted_indices] test_images = test_images[sorted_indices] create_samples(test_pv[:], test_images, test_times)

nilsleh commented 4 months ago

Thanks for your reply. Would it be possible for you to upload the actual datasets used for the paper to a platform like Huggingface, for example at torchgeo, we have hosted a version of it here.

yuhao-nie commented 4 months ago

Thanks for help disseminate it! Would you mind changing the reference on the dataset card to our dataset paper:

Nie Y, Li X, Scott A, Sun Y, Venugopal V, Brandt A. SKIPP’D: A SKy Images and Photovoltaic Power Generation Dataset for short-term solar forecasting. Solar Energy. 2023 May 1;255:171-9.

or

@article{nie2023skipp,
  title={SKIPP’D: A SKy Images and Photovoltaic Power Generation Dataset for short-term solar forecasting},
  author={Nie, Yuhao and Li, Xiatong and Scott, Andea and Sun, Yuchi and Venugopal, Vignesh and Brandt, Adam},
  journal={Solar Energy},
  volume={255},
  pages={171--179},
  year={2023},
  publisher={Elsevier}
}

And we actually have a project going on for that, something similar to the torchgeo project you referred to, which also involves other datasets and an accompanying python library. Will announce it when it is ready.