gridclub / Hack2O

Annual 2017 hackathon at UMass.
http://gridclub.io/Hack2O/
MIT License
1 stars 2 forks source link

Hackathon discussion forum #2

Open AruniRC opened 7 years ago

AruniRC commented 7 years ago

Purpose: use the github issues like a discussion forum for this hackathon.

You can ask questions about the dataset, some technical issues, doubts about any theoretical or implementation problems with your models that you may be having. We'll try to help to the best of our ability.

thanks, Aruni

djsaunde commented 7 years ago

In the SampleTime column of the "DATA" worksheet of the "Copper_Iron_and_Lead_for_GriD.xlsx" workbook seems to have duplicate time entries (e.g., 12:21 appears multiple times for house 8, and 8:12 appears multiple times for house 17). Is there a way to resolve this? If we're interested in time series data, it might be hard to work with multiple data points which occur at the same time.

ospiro commented 7 years ago

The samples were taken in quick succession and times were not recorded to the second. However, the samples are in chronological order, and the Samp_No column can be used to order them within each property.

heyitsjoe commented 7 years ago

Correct. Samples are in chronological order, and drawn in fairly rapid succession. My guess is that minimal time elapsed and was just neglected by whomever took the samples.

heyitsjoe commented 7 years ago

I had another thought on this. The samples that start with DS (the distribution samples) were drawn last, and have a different recorded time then the other samples at each respective location. One could calculate the elapsed time between the first sample and the DS sample and then divide by the total number of samples at each location to come up with the individual elapsed/unique time for each sample drawn, if desired. This would assume consistent time sampling, which I think it a defensible assumption. Just a thought. Probably not required.