Open haderazzini opened 8 years ago
I think that by passing the sample_period
parameter, you're asking NILMTK to resample the power series to 1 second resolution. If you don't want to resample, simply don't pass the parameter as in your example.
Hi Oliver,
Thank you for the quick answer. However, I need to resample. I believe power_series should have a method to fill the NAN.
I haven't checked the dataset but I suspect the problem is not with the code, but rather in my dataset, UK-DALE :)
If you use meter.power_series()
(i.e. without specifying a sample_period
) then you'll get back the raw data. If there are gaps then these won't appear as NaNs. Instead gaps in the data simply won't be represented. There will be no rows of data when no data was recorded.
If you force NILMTK to resample to 1Hz by passing power_series(sample_period=1)
then any missing data will be represented as NaNs.
There is no perfect way to fix this. If the data is missing then the data is missing ;)
It might be best to hunt around for a period of time in the dataset when there are fewer gaps in the data. See Figure 3 in my paper.
I also believe that the problem is in the dataset. However, I think that power_series
should have the option to ffill
or bffil
, like in pandas. I think the option to fill the NaNs will make more easy to work with missing data.
I also believe that the problem is in the dataset
Sure. Although, just to be clear: pretty much all datasets have missing data. It's not just UK-DALE :)
I think that power_series should have the option to ffill or bffil, like in pandas.
It does (although I admit that it's not well documented). You can pass a resample_kwargs
dict:
resample_kwargs : dict of key word arguments (other than 'rule') to
`pass to pd.DataFrame.resample()`. Defaults to set 'limit' to
`sample_period / max_sample_period` and sets 'fill_method' to ffill.
See the ElecMeter.load()
docs.
NILMTK does, by default, forward fill sample_period / max_sample_period
samples.
If you want to forward fill the entire gap then do something like power_series(resample_kwargs={"limit": None})
(I haven't tested this. And I would recommend not forward filling the entire gap. Instead I'd zero-out large gaps in UK-DALE's appliances)
That is true, the majority of datasets have missing data.
I'm making ffill because I want to make a virtual main meter using the sum of appliances. The NaNs in appliances make appears NaN in the virtual main meter. Do you have any good idea how to avoid it?
I'd still recommend zeroing-out the NaNs in the appliance data. Something like power_series(resample_kwargs={"fill_value": 0})
might work. (not tested)
Hello,
I'm trying to load the data frame of the submeters using power_series, but it brings a lot of NaN:
In[10]:
Out[10]:
How can I fix it?
Regards