Closed qixing0375 closed 1 year ago
When I use sampe size less than 2000, all processes are normal. But once I increase sample size, all the problems shown above appears.
I'm sorry @qixing0375 , but I cannot debug the code if I cannot reproduce the issue. Can you please provide a code snippet that reproduces it? It doesn't matter that the data is random as long as it maintains the same shape. I also need you to paste the full stack trace between ```
X.shape, y.shape
((7970, 33, 36), (7970,))
splits = get_splits(y, valid_size=0.2, random_state=23, shuffle = True)
tfms = [None, [Categorize()]]
dsets = TSDatasets(X, y, tfms=tfms, splits=splits, inplace=True)
dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[64, 128], batch_tfms=[TSStandardize(by_var=True)], num_workers=0)
model = InceptionTime(dls.vars, dls.c)
learn = Learner(dls, model, metrics=accuracy)
learn.save('stage0')
learn.load('stage0')
learn.lr_find()
16.00% [4/25 00:13<01:10] epoch train_loss valid_loss accuracy time 0 nan nan 0.958595 00:03 1 nan nan 0.958595 00:03 2 nan nan 0.958595 00:03 3 nan nan 0.958595 00:03
KeyboardInterrupt Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13020\4198329076.py in
c:\Users\qixin\anaconda3\lib\site-packages\fastai\callback\schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt, start_epoch) 117 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final), 118 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))} --> 119 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd, start_epoch=start_epoch) 120 121 # %% ../../nbs/14_callback.schedule.ipynb 50
c:\Users\qixin\anaconda3\lib\site-packages\fastai\learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt, start_epoch) 254 self.opt.set_hypers(lr=self.lr if lr is None else lr) 255 self.n_epoch = n_epoch --> 256 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup) 257 258 def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None
c:\Users\qixin\anaconda3\lib\site-packages\fastai\learner.py in _with_events(self, f, event_type, ex, final) 191 192 def _with_events(self, f, eventtype, ex, final=noop): --> 193 try: self(f'before{event_type}'); f() 194 except ex: self(f'aftercancel{event_type}') ... --> 160 sqravg.mul(sqrmom).addcmul(p.grad.data, p.grad.data, value=damp) 161 return {'sqr_avg': sqr_avg} 162
KeyboardInterrupt: arch hyperparams total params train loss valid loss accuracy time 0 FCN {} 292865 NaN NaN 0.958595 6 1 ResNet {} 494721 NaN NaN 0.958595 12 XResNet
20.00% [2/10 00:06<00:26] epoch train_loss valid_loss accuracy time 0 nan nan 0.958595 00:03 1 nan nan 0.958595 00:03
KeyboardInterrupt Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13020\63561938.py in
c:\Users\qixin\anaconda3\lib\site-packages\fastai\callback\schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt, start_epoch) 117 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final), 118 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))} --> 119 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd, start_epoch=start_epoch) 120 121 # %% ../../nbs/14_callback.schedule.ipynb 50
c:\Users\qixin\anaconda3\lib\site-packages\fastai\learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt, start_epoch) 254 self.opt.set_hypers(lr=self.lr if lr is None else lr) 255 self.n_epoch = n_epoch --> 256 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup) 257 258 def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None
c:\Users\qixin\anaconda3\lib\site-packages\fastai\learner.py in _with_events(self, f, event_type, ex, final) 191 ... --> 197 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 198 tensors, gradtensors, retain_graph, create_graph, inputs, 199 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
IndexError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13680\2529580247.py in
c:\Users\qixin\anaconda3\lib\site-packages\fastai\callback\schedule.py in lr_find(self, start_lr, end_lr, num_it, stop_div, show_plot, suggest_funcs) 302 for func in tuplify(suggest_funcs): 303 nms.append(func.name if not isinstance(func, partial) else func.func.name) # deal with partials --> 304 _suggestions.append(func(lrs, losses, num_it)) 305 306 SuggestedLRs = collections.namedtuple('SuggestedLRs', nms)
c:\Users\qixin\anaconda3\lib\site-packages\fastai\callback\schedule.py in valley(lrs, losses, num_it) 229 idx = max_start + int(sections) + int(sections/2) 230 --> 231 return float(lrs[idx]), (float(lrs[idx]), losses[idx]) 232 233 # %% ../../nbs/14_callback.schedule.ipynb 81
IndexError: index 0 is out of bounds for dimension 0 with size 0
learn.fit_one_cycle(25, lr_max=1e-3)
learn.save('stage1')
archs = [(FCN, {}), (ResNet, {}), (xresnet1d34, {}), (ResCNN, {}),
(LSTM, {'n_layers':1, 'bidirectional': False}), (LSTM, {'n_layers':2, 'bidirectional': False}), (LSTM, {'n_layers':3, 'bidirectional': False}),
(LSTM, {'n_layers':1, 'bidirectional': True}), (LSTM, {'n_layers':2, 'bidirectional': True}), (LSTM, {'n_layers':3, 'bidirectional': True}),
(LSTM_FCN, {}), (LSTM_FCN, {'shuffle': False}), (InceptionTime, {}), (XceptionTime, {}), (OmniScaleCNN, {}), (mWDN, {'levels': 4})]
results = pd.DataFrame(columns=['arch', 'hyperparams', 'total params', 'train loss', 'valid loss', 'accuracy', 'time'])
for i, (arch, k) in enumerate(archs):
model = create_model(arch, dls=dls, **k)
print(model.__class__.__name__)
learn = Learner(dls, model, metrics=accuracy)
start = time.time()
learn.fit_one_cycle(10, 1e-3)
elapsed = time.time() - start
vals = learn.recorder.values[-1]
results.loc[i] = [arch.__name__, k, count_parameters(model), vals[0], vals[1], vals[2], int(elapsed)]
results.sort_values(by='accuracy', ascending=False, ignore_index=True, inplace=True)
clear_output()
display(results)
I have also tried other models, but showed the similar result.
I tried with a randomly generated np.arrary with same shape, the code work. But it somehow didn’t work with my own array. I have checked that there is no NA in my array, and the batch visualization seems okay that all variable are standardized. Do you have any idea?
Thanks! Qixing
My npy file size is about 75mb, which is a little bit large. I tested a little bit more, when there is about 6000 samples the code works, but if I increased sample size to 9000, the error appeared. Are you still wanna try the bug? If so, how should I share you the npy?
Hi @qixing0375 , I tried to reproduce the issue based on the information you have provided but have failed. Everything seems to be working well even if I create a dataset with 50k samples (see gist). Can you please run my_setup() and check_data(X, y, splits) and paste the output here.
Problem solved. There is NA mixed in the middle of my samples, that is why when I increase sample input and error appeared. Thank you so much!
dls and model are exactly the same as in the tutorial.
I have checked there is no NA in my input dataset. Does anyone know the probelm here? THANKS!!!