Open cversek opened 9 months ago
@oguiza @williamsdoug
As I said, I'm fairly motivated to help fix this issue. I went back and ran this notebook with v0.3.7 and v0.3.6 and got the same ValueError: Target size (torch.Size([384])) must be the same as input size (torch.Size([2304]))
But when I checked out v0.3.5 that bit of code ran without error! I will post back soon with any other diagnostic information that I discover. Thanks for creating this awesome package and example code.
So, moving on from that line in v0.3.5 I do eventually run into an error at the end of the training loop. But I do consider this great progress (retrogress?). See truncated screenshot of notebook:
Full stack trace:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[20], line 2
1 learn = ts_learner(dls, InceptionTimePlus, metrics=[partial(accuracy_multi, by_sample=True), partial(accuracy_multi, by_sample=False)], cbs=ShowGraph())
----> 2 learn.fit_one_cycle(10, lr_max=1e-3)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/callback/schedule.py:119, in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt, start_epoch)
116 lr_max = np.array([h['lr'] for h in self.opt.hypers])
117 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
118 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 119 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd, start_epoch=start_epoch)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:264, in Learner.fit(self, n_epoch, lr, wd, cbs, reset_opt, start_epoch)
262 self.opt.set_hypers(lr=self.lr if lr is None else lr)
263 self.n_epoch = n_epoch
--> 264 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:201, in Learner._with_events(self, f, event_type, ex, final)
199 try: self(f'before_{event_type}'); f()
200 except ex: self(f'after_cancel_{event_type}')
--> 201 self(f'after_{event_type}'); final()
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:172, in Learner.__call__(self, event_name)
--> 172 def __call__(self, event_name): L(event_name).map(self._call_one)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastcore/foundation.py:156, in L.map(self, f, *args, **kwargs)
--> 156 def map(self, f, *args, **kwargs): return self._new(map_ex(self, f, *args, gen=False, **kwargs))
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastcore/basics.py:840, in map_ex(iterable, f, gen, *args, **kwargs)
838 res = map(g, iterable)
839 if gen: return res
--> 840 return list(res)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastcore/basics.py:825, in bind.__call__(self, *args, **kwargs)
823 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
824 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 825 return self.func(*fargs, **kwargs)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/learner.py:176, in Learner._call_one(self, event_name)
174 def _call_one(self, event_name):
175 if not hasattr(event, event_name): raise Exception(f'missing {event_name}')
--> 176 for cb in self.cbs.sorted('order'): cb(event_name)
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/callback/core.py:62, in Callback.__call__(self, event_name)
60 try: res = getcallable(self, event_name)()
61 except (CancelBatchException, CancelBackwardException, CancelEpochException, CancelFitException, CancelStepException, CancelTrainException, CancelValidException): raise
---> 62 except Exception as e: raise modify_exception(e, f'Exception occured in `{self.__class__.__name__}` when calling event `{event_name}`:\n\t{e.args[0]}', replace=True)
63 if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
64 return res
File /opt/mambaforge/envs/neurovep_data/lib/python3.11/site-packages/fastai/callback/core.py:60, in Callback.__call__(self, event_name)
58 res = None
59 if self.run and _run:
---> 60 try: res = getcallable(self, event_name)()
61 except (CancelBatchException, CancelBackwardException, CancelEpochException, CancelFitException, CancelStepException, CancelTrainException, CancelValidException): raise
62 except Exception as e: raise modify_exception(e, f'Exception occured in `{self.__class__.__name__}` when calling event `{event_name}`:\n\t{e.args[0]}', replace=True)
File ~/gitwork/timeseriesAI/tsai/tsai/callback/core.py:101, in ShowGraph.after_fit(self)
99 plt.close(self.graph_ax.figure)
100 if self.plot_metrics:
--> 101 self.learn.plot_metrics(final_losses=self.final_losses, perc=self.perc)
File ~/gitwork/timeseriesAI/tsai/tsai/learner.py:231, in plot_metrics(self, **kwargs)
228 @patch
229 @delegates(subplots)
230 def plot_metrics(self: Learner, **kwargs):
--> 231 self.recorder.plot_metrics(**kwargs)
File ~/gitwork/timeseriesAI/tsai/tsai/learner.py:218, in plot_metrics(self, nrows, ncols, figsize, final_losses, perc, **kwargs)
216 else:
217 color = '#ff7f0e'
--> 218 label = 'valid' if (m != [None] * len(m)).all() else None
219 axs[ax_idx].grid(color='gainsboro', linewidth=.5)
220 axs[ax_idx].plot(xs, m, color=color, label=label)
AttributeError: Exception occured in `ShowGraph` when calling event `after_fit`:
'bool' object has no attribute 'all'
I am going to guess that some of my dependencies are too new for this older tsai version, here is my setup for the system I'm currently testing:
os : Linux-6.2.0-34-generic-x86_64-with-glibc2.35
python : 3.11.0
tsai : 0.3.5
fastai : 2.7.12
fastcore : 1.5.29
torch : 2.0.1
cpu cores : 4
threads per cpu : 2
RAM : 62.58 GB
GPU memory : N/A
For the TSMultiLabelClassification
comparing the output of learn.model
for InceptionTimePlus
between v0.3.5 and v0.3.6 everything is the same except for the learn.model.head
:
Sequential(
(0): create_lin_nd_head(
(0): fastai.layers.Flatten(full=False)
(1): Linear(in_features=17920, out_features=6, bias=True)
(2): Reshape(bs, 6)
)
)
Sequential(
(0): lin_nd_head(
(0): Reshape(bs)
(1): Linear(in_features=17920, out_features=36, bias=True)
(2): Reshape(bs, 6, 6)
)
)
Clearly something has changed and maybe its not shaping up the output properly?
For the Multi-class TSClassification
, which works in both versions, the differences are more subtle for learn.model.head
:
learn.model.head
:
Sequential(
(0): Sequential(
(0): GAP1d(
(gap): AdaptiveAvgPool1d(output_size=1)
(flatten): fastai.layers.Flatten(full=False)
)
(1): LinBnDrop(
(0): Linear(in_features=128, out_features=5, bias=True)
)
)
)
Sequential(
(0): Sequential(
(0): GAP1d(
(gap): AdaptiveAvgPool1d(output_size=1)
(flatten): Reshape(bs)
)
(1): LinBnDrop(
(0): Linear(in_features=128, out_features=5, bias=True)
)
)
)
@oguiza I just noticed the similarity with previously fixed issues:
Could be a regression, I will try to study what the fixes were there until someone more qualified can take over ;)
@oguiza
I was eventually able to hunt down the problematic commit 9caff8f by using git bisect
between tags 0.3.6 (bad) and 0.3.5 (good) and checking the notebook example. There is some uncertainty about whether reverting that commit will cause other regressions in other parts of the code base, so I submitted a draft PR to fix the issue: https://github.com/timeseriesAI/tsai/pull/855
It would be awesome if the maintainers can help with implementing the fix. I have about exhausted my capabilities to really understand how the magic model auto-configuration system is supposed to work. Once a fix is decided upon and tested, I will happily close out the issue!
Having the same issue!
@oguiza @Munib5 Sorry, I thought I had a real fix, but I was mistaken again.
Tracing what is going wrong is a bit maddening. The problem starts with the creation of the DataLoaders
:
At this point the dls
object has two attributes:
dls.c == 6
dls.d == 6
When the learner.ts_learner
function is invoked as in:
learn = ts_learner(dls, InceptionTimePlus, metrics=accuracy_multi, verbose=True)
it delegates to models.utils.build_ts_model
with c_out=None
and d=None
. Internally those parameters are obtained from the dls
object (see source).
Still inside build_ts_model
, a 'custom_head' argument gets tacked on to a kwargs
dict (see source):
kwargs['custom_head'] = partial(kwargs['custom_head'], d=d)
here d=6. The kwargs
dict is later injected into the model configuration (see source):
model = arch(c_in, c_out, seq_len=seq_len, **arch_config, **kwargs).to(device=device)
As indicated by this printout if you set verbose=True
in the ts_learner
call:
arch: InceptionTimePlus(c_in=1 c_out=6 seq_len=140 arch_config={} kwargs={'custom_head': functools.partial(<class 'tsai.models.layers.lin_nd_head'>, d=6)})
Subsequently when the lin_nd_head
constructor is called (see source), the local shape
and fd
variables take on these values
shape == [6,6]
fd == 6
Later in that constructor (see code), this problematically shaped layer is appended to the model:
else:
if seq_len == 1:
layers += [nn.AdaptiveAvgPool1d(1)]
if not flatten and fd == seq_len:
layers += [Transpose(1,2), nn.Linear(n_in, n_out)]
else:
layers += [Reshape(), nn.Linear(n_in * seq_len, n_out * fd)]
layers += [Reshape(*shape)]
where the keys variables take on these values:
n_in == 128
seq_len == 140
n_out == 6
fd == 6
shape == [6,6]
That shows that there is potentially a conflict between the property c
which is supposed to be the "number of classes/categories" and the property d
which possibly means some sort of dimension (? the source isn't very clear on these semantics).
Again, my limited understanding of the architecture of this library is providing a major impediment for finding a fix that will satisfy everyone :) Here's hoping that the maintainers will take over!
@Munib5 With all that said, a temporary workaround might be to do something like:
learn = ts_learner(dls, InceptionTimePlus, metrics=accuracy_multi, verbose=True, d=1)
where we force the d
property back to something that provides a compatible shape for the torch.binary_cross_entropy_with_logits
function.
When running notebook![image](https://github.com/timeseriesAI/tsai/assets/1470227/e5507cf7-1457-4aeb-9d7b-530b89d3aabc)
01a_MultiClass_MultiLabel_TSClassification.ipynb
under the MultiLabel section, specifically this cell code, results in this error (cropped screen shot):Here is the full traceback:
I'm running this commit locally (currently head of main branch) and the output of
my_setup()
is:I would be happy to spend quite a bit more effort in figuring out the right way to do this example. Thanks!