Closed nils-braun closed 3 years ago
@tylerwmarrs is digging into this.
I think that the real issue here is that the threshold is set to 0.0 which will never be met. In my opinion, the real fix is to enforce realistic thresholds. Having the threshold be any number between 0 and 1 isn't very realistic for a cross correlation threshold. Maybe 0.5 to 1? What do you guys think? @nils-braun @vanbenschoten
Having a more reasonable range will also prevent users from computing way more than they intended to. A threshold of 0.1 or something similar would basically compute every possible window size; depending on the time series in question of course.
As an aside, @nils-braun did you intentionally set the threshold to 0 for testing functionality or did you think that the threshold means something else?
Sorry, I shouldn't have picked that example - my bad. It fails with every setting I tried - 0 was just the last one and I unintentionally posted this. Sorry for the confusion...
Sorry, I shouldn't have picked that example - my bad. It fails with every setting I tried - 0 was just the last one and I unintentionally posted this. Sorry for the confusion...
No worries at all! I still think that it is a valid issue. What do you think about adjusting the range validation?
@tylerwmarrs I agree with restricting the threshold to > 0.5 (I might even go as high as 0.65 to speed up compute times, as I anticipate that users will leverage ever-larger timeseries moving forward)
@nils-braun You suggested that we return a "nan" value for the tsfresh integration. Do you think that we can just throw a runtime exception or custom exception and catch it within the tsfresh integration? This, I feel, will help solve both issues without taking away from either goal.
So changes on my end would be:
Philosophically, I like the "fail fast" approach to software instead of giving a difficult to interpret result. Some implementations of the matrixprofile explicitly replace particular values with "nans" making it difficult to understand whether there was a computation issue or data issue. The "nan" approach here is similar to me. However, for tsfresh, I feel that returning a nan makes perfect sense.
I don't want to go off course too much, but have you, @nils-braun, considered adding a summary/context report at the end of the run? This could help with debugging things that fail silently, but you do not want to waste all of the other computations.
I think raising an exception is totally fine! This is something we can handle in tsfresh and then deal with it (we already do this in other calculators). Sorry, that was actually a misunderstanding: the currently called subfunction in matrixprofile returns a NaN (which currently causes the problem) - I do not think that returning NaN from matrixprofile would be the best thing here :-)
Concerning the context object: that would be a valuable approach but is currently not implemented. We rely on the Python warnings, which can also be captured (there exist already tooling around that), but might be worth to do a custom object. I will think about it, thanks!
@nils-braun @tylerwmarrs given the recent PR merge, I'm going to close this issue. Feel free to re-open if you think more discussion is warranted.
@nils-braun @vanbenschoten and I chatted about the lower bounding threshold and felt that keeping it open as is makes the most sense. We do not really know the repercussion of enforcing such a range.
Describe the bug When calling
mp.compute
on one timeseries of the robot example dataset (e.g. for id = 1), matrixprofile first throws a warningand then raises an exception
Looking at the code, I think that
maximum_subsequence
returns a NaN when it has not found a window (here), which is not the format expected by compute (which expects a dictionary of some sort).The root cause for this, is probably that the timeseries under study is quite short (length 15), so I would be happy to understand if this is a systematic problem for short time series (the fact that no windows are found, not the fact that the exception is raised :-))
To Reproduce
Expected behavior No exception :-)
Desktop (please complete the following information):
Additional context Related to https://github.com/blue-yonder/tsfresh/issues/785
Please note that if you are using tsfresh's example robot data, one timeseries are all values for a single id (so the full data sample has actually 88 time series in it)