nschaetti / EchoTorch

A Python toolkit for Reservoir Computing and Echo State Network experimentation based on pyTorch. EchoTorch is the only Python module available to easily create Deep Reservoir Computing models.
https://nschaetti.github.io/echotorch.github.io/
GNU General Public License v3.0
456 stars 117 forks source link

How to prepare training data, especially the size ? #2

Open FabianChenDP opened 6 years ago

FabianChenDP commented 6 years ago

I have a batch of time serial data for regression analysis. Every timestamp has 30 features. At the beginning data are prepared as numpy ndarries. Then, I transform them into tensor datasets and set the batch_size=15 for data loader, just like this:

data_tensors = TensorDataset(torch.Tensor(x_tr), torch.Tensor(y_tr))

loader_tr = DataLoader(
            data_tensors, batch_size=batch_size, shuffle=False, num_workers=4)

However, I got an error as follows.

~/miniconda3/envs/py36/lib/python3.6/site-packages/echotorch/nn/ESNCell.py in forward(self, u, y, w_out)
    128 
    129                 # Compute input layer
--> 130                 u_win = self.w_in.mv(ut)
    131 
    132                 # Apply W to x

RuntimeError: mv: Expected 1-D argument vec, but got 0-D

It looks like the forward method need parameter "u" to be a 3-D tensor, and time_length need to be set explicitly. Is the time_length mean the number of reservoirs ? but we already have the hidden_dim.

I am quite confused about how to prepare the training data for LiESN. Could you please help me?

nschaetti commented 6 years ago

Hi!

The input to the LiESN/ESN should be a 3D tensor of size "batch size" x "time length" x "input dimension". The input dimension is set at the creation of the LiESN. To use the batch size superior to one, all the input time series should have the same length. If it is not the case I use batch_size = 1. What is the size of our input tensor? (if you print x.size())? It seems that ut has zero dimension, so your input is probably 1D.

Hope it will help you.

Nils

FabianChenDP commented 6 years ago

Hi Nils,

Thank you for your reply.

However, I am afraid your guess is not correct exactly. In the example above, x_tr is a 2D training ndarray with shape of (10000, 30), while y_tr is a 1D ndarray of shape (10000, 1).

What should I do? Could you please help me find a way to reshape the input data? Thanks a lot.

nschaetti commented 6 years ago

Hi,

So your input data is a 30-dim time series of length 10000, right?

The class TensorDataset will take samples along first dimension of x_tr. So the tensor you give to the ESN is probably of size "batch_size" x 30.

If x_tr is a single dataset, you can give it directly to the ESN after adding a batch dimension :

u = torch.Tensor(x_tr) y = torch.Tensor(y_tr) u = u.view(1, -1, 30) y = y.view(1, -1, 1) u, y = Variable(u), Variable(y) esn(u, y) esn.finalize()

Can you show me the complete code?

Regards,

Nils

FabianChenDP commented 6 years ago

Dear Nils,

My code is as follows:

class TorchEsnModelTrainer(object):
    def pre_fit(self, dfx, y=None):
        x = torch.Tensor(dfx).view(dfx.shape[0], -1, dfx.shape[1])
        y = torch.Tensor(y).view(y.shape[0], -1, 1)

        return Variable(x), Variable(y)

    def train(self, x_tr, y_tr, hidden_size=60, **kwargs):
        """ train the model """
        num_features = x_tr.shape[1]
        x_tr, y_tr = self.pre_fit(x_tr, y_tr)

        # model
        esn = etnn.LiESN(
            num_features,
            hidden_size,
            1,
            learning_algo='inv',
        )

        esn(x_tr, y_tr)
        esn.finalize()
        self.model = esn

        return self.model

Since the raw input data of x_tr is a 10000X30 dims ndarray, thus I first use pre_fit to transform it to the required format, then fit into the esn model. This time, I got my notebook quit directly from the following errors:

** On entry to SLASWP, parameter number 6 had an illegal value

Thank you very much for your patience.

Regards,

sebastienwood commented 6 years ago

Hi,

I'm trying to replicate the examples provided (https://github.com/nschaetti/EchoTorch/blob/master/examples/timeserie_prediction/narma10_esn.py). The same kind of issue appears :

RuntimeError: size mismatch, [100 x 10], [1] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1524590658547/work/aten/src/TH/generic/THTensorMath.c:1928

The line that make this issue appears is : u_win = self.w_in.mv(ut)

It seems related to the issue you ran into @FajunChen.

Also, when trying on a custom dataset, using the view(1,-1,input_dim) to conform to Pytorch's RNN format, the issue moves : RuntimeError: size mismatch, [100 x 1], [5] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1524590658547/work/aten/src/TH/generic/THTensorMath.c:1928 y_wfdb = self.w_fdb.mv(yt)

@nschaetti maybe one way to adress this issue in the future would be to include an util to import and automatically convert tabular data like .csv ? Thanks !

jlousada315 commented 5 years ago

Hi,

I have the same error while trying to compile the Switch Attractor Example. The input_dim is equal to 1 by default , but if I try to change it i get a size mismatch error. Can you help ? thanks in advance !

FabianChenDP commented 5 years ago

Sorry, it is long time ago. I can not remember the details now.

------------------ Original message ------------------ From: "johnnylousas"; Sendtime: Sunday, May 5, 2019 9:58 PM To: "nschaetti/EchoTorch"; Cc: "Fajun Chen"; "Mention"; Subject: Re: [nschaetti/EchoTorch] How to prepare training data, especiallythe size ? (#2)

Hi,

I have the same error while trying to compile the Switch Attractor Example. The input_dim is equal to 1 by default , but if I try to change it i get a size mismatch error. Can you help ? thanks in advance !

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jlousada315 commented 5 years ago

Then how do I fix it ?

matthewewreed commented 5 years ago

I've become interested in solving the problem that gives rise to this bug (having an input dimension greater than 1). Coupled nonlinear systems are, on their own, pretty cool. Having accurate (over relatively short timescale) models would be extraordinarily useful. I've brute-forced my way through every windows bug and edited Nils' code so it can be called without producing pickling errors. But now I can't even figure out where w_in.mv is defined.

sebastienwood commented 5 years ago

If I'm not mistaken it has been corrected by the fix 6e1ea94 2 months ago : https://github.com/nschaetti/EchoTorch/commit/6e1ea944a180a4c65d5dfbd3426cdb20acc4a1f0

I believe @nschaetti may want to review this issue to decide to close it or not ! :)

matthewewreed commented 5 years ago

Interesting. I'm using the corrected code.