Closed kmsquire closed 6 years ago
There's the Segmenter
in the preprocessing
module that does exactly that, although it only works along one dimension so far. It should be easily extendable to more dimensions though, I based my implementation (utils.segment_array
) on this StackOverflow question.
Also, it should work without copying data when the return_view
parameter of the transformer is set to true, however, this isn't working yet and I have yet to investigate the reason.
If these limitations don't bother you, the transformer should do exactly what you want. Otherwise, at least the copying part is a priority for me that I want to fix soon, but feel free to look into it as well.
Great, thanks! It's close enough for now that I can work with it. Returning a view would be great, of course (although the default pip install version doesn't have that parameter yet).
One other thing I would like to do is shift the sample indices to center on (or trail) the index around which I'm grouping.
For example, if I set new_len=3, step=1
for a 100x10 DataArray
(as from load_dummy_dataarray()
), I'd like the resulting sample
indices to go from 1 to 98. (or sometimes 2 to 99), so that they can be matched up with corresponding y
values.
pandas.DataFrame.rolling
for example, has a center
keyword that accomplishes the first of these.
(Edit: Of course, I can set this manually for now.)
Just as a point of reference, one more thing that I sometimes want do is subsample the new dimension (which is done easily enough with slicing and which does currently return a view).
If I get the chance, I'll try to submit a pull request with an example for the docs.
I have a dataset which has features and targets indexed by time.
I would like to provide overlapping (possibly subsampled) windows of the features as feature input to an ML algorithm.
I can certainly construct this by hand, but I'm wondering how to provide this windowed input without copying data, possibly via a Transformer.
Is this possible within the existing list of transformers? This isn't clear to me. If it is not possible, how easy would it be to add a transformer to handle this?
Edit: I guess I can try to wrap
array.rolling
, although it's still unclear to me (so far) how to provide this to a scikit-learnfit
function.