AssertionError: You must pass an X iterable with 3 dimensions [batch_size x n_vars x seq_len]

connormeaton commented 1 year ago

Hello, I keep running into this issue when running get_x_preds on an array:

  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/tsai/data/core.py", line 592, in new_dl
    assert X.ndim == 3, "You must pass an X iterable with 3 dimensions [batch_size x n_vars x seq_len]"
AssertionError: You must pass an X iterable with 3 dimensions [batch_size x n_vars x seq_len]

The shape of the array is (1,1,12), which I believe is the correct shape. I am inputing 1 batch that has 1 variable and is 12 elements long.

Perhaps I am creating the array improperly?:

count = 0
temp = []
X = np.load('array.npy') # X has shape (3000,) and I am predicting every 12 elements
for i in X:
    temp.append(i)
    count+=1
    if count == speaker_time_model:
        speaker_pred = model.get_X_preds(np.array([[temp]]))
        count = 0
        temp = []

Should I do something different? Thanks!

Also, this code runs just fine on my Mac. It breaks when I try to run on my linux (Deb 10)

oguiza commented 1 year ago

Hi @connormeaton, The learner raises an error because the shape of the data you pass is not 3D. You can easily check it this way:

speaker_time_model = 12
count = 0
temp = []
X = np.random.rand(3000, 1, 12)
for i in X:
    temp.append(i)
    count+=1
    if count == speaker_time_model:
        print(np.array([[temp]]).shape)
        count = 0
        temp = []

tsai can create multiple predictions at the same time. You can pass a large X and it'll split it into batches. If you want to pass every 12th sample, you could just use this code:

X = np.random.rand(3000, 1, 12)[::12]
preds, *_ = model.get_X_preds(X)

connormeaton commented 1 year ago

Thank you @oguiza ! This is helpful. However, my starting array is not shape (3000,1,12) as in your example. It is shape (3000,). I need to get it into chunks of 12 first. In my problem, I am trying to predict on sequences of len(12). In this case, I will have 250 sequences of length 12 after I chunk the 3000 into sequences of 12. In reality, it should be converted from (3000,) to (250,1,12) .I am doing this like so:

X = read_npy(path)
# X.shape = `(3000,)`
split_val = len(X)/12 # 12 = desired sequence length # 250 in this case
split_X = np.array(np.split(X, split_val)).reshape((split_val, 1, 12)) # this gives me (250,1,12)
preds = model.get_X_preds(split_X)

and it stll gives me this error:

TypeError: only integer scalar arrays can be converted to a scalar index

What am I missing here?

oguiza commented 1 year ago

If you want to split a 30K time step series into 250 samples of 12 steps each, without overlapping, you could use this:

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

X = np.arange(3_000)
X = sliding_window_view(X, 12, 0)[::12, None]
X.shape, X[:3]
# output: 
# ((250, 1, 12),
#  array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]],

#         [[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]],

#         [[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]]]))

I believe that's what you want.

connormeaton commented 1 year ago

Thanks, that is nice code. It works fast. Unfortunately, I get the same error as before. Again, this works on my mac but not on my linux using the same code and same data.

oguiza commented 1 year ago

What is the error you get? It cannot be the X.shape. When you report an error it's important to paste the entire full stack trace between 3 back ticks on each side.

connormeaton commented 1 year ago

My apologies, here is the full stack trace:

Could not do one pass in your dataloader, there is something wrong in it. Please see the stack trace below:
Traceback (most recent call last):
  File "inf.py", line 24, in <module>
    predict.call_funcs(data)
  File "/home/admin/v1/src/cv/predict.py", line 61, in call_funcs
    pred = model.get_X_preds(arr)
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/tsai/inference.py", line 17, in get_X_preds
    dl = self.dls.valid.new_dl(X, y=y, bs=bs)
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/tsai/data/core.py", line 597, in new_dl
    return self.new(ds, bs=min(bs, len(X)))
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/fastai/data/core.py", line 97, in new
    self._one_pass()
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/tsai/data/core.py", line 560, in _one_pass
    b = self.do_batch([self.do_item(0)])
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/fastai/data/load.py", line 168, in do_batch
    def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/tsai/data/core.py", line 613, in create_batch
    return self.dataset[b]
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/tsai/data/core.py", line 496, in __getitem__
    return tuple([ptl[it] for ptl in self.ptls])
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/tsai/data/core.py", line 496, in <listcomp>
    return tuple([ptl[it] for ptl in self.ptls])
  File "/home/admin/mambaforge/envs/fastai/lib/python3.8/site-packages/tsai/data/core.py", line 328, in __getitem__
    else: return self.items[self._splits[it]]
TypeError: only integer scalar arrays can be converted to a scalar index

connormeaton commented 1 year ago

Hi @oguiza , I was wondering if you had any thoughts on this last error? I'm still unable to progress on this issue. The code works fine on my Mac M1 but it fails with the above error on Deb 10 Linux. All code and data is the exact same.

EDIT

I noticed that my mac is running tsai 0.3.1 and my debian is running 0.3.5. I don't want to downgrade and lose all the updates. Any reason this might be causing the issue?

oguiza commented 1 year ago

There's a new pip release: 0.3.6. Could you upgrade to the new release and test your code again?

oguiza commented 1 year ago

Closed due to lack of response.

timeseriesAI / tsai

AssertionError: You must pass an X iterable with 3 dimensions [batch_size x n_vars x seq_len] #733

EDIT