libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.81k stars 180 forks source link

numba related error: Exception ignored in: <finalize object at 0x7f8ee0524e20; dead> #167

Closed tfriedel closed 2 years ago

tfriedel commented 2 years ago

I'm currently experimenting with a variation of the code provided in the ffcv-imagenet repo. And it happens when I do repeatedly instantiate the ImageNetTrainer class and perform multiple trainings after each other, I encounter this error before and during the second training:

Exception ignored in: <finalize object at 0x7f8ee0524e20; dead>
Traceback (most recent call last):
  File "/home/thomas/conda/envs/ffcv/lib/python3.9/weakref.py", line 591, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/home/thomas/conda/envs/ffcv/lib/python3.9/site-packages/numba/core/dispatcher.py", line 312, in finalizer
    for cres in overloads.values():
KeyError: (array(uint8, 1d, C), array(uint8, 1d, C), uint32, uint32, uint32, uint32, Literal[int](0), Literal[int](0), Literal[int](1), Literal[int](1), Literal[bool](False), Literal[bool](False))
ep=0, iter=11, shape=(64, 3, 160, 160), lrs=['0.001', '0.001']: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:12<00:00,  1.01s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:03<00:00,  6.44it/s]

It seems like it doesn't affect training and everything still works as expected. I suspect it refers to some dead objects that couldn't get cleaned up properly. Unfortunately I don't have a minimal code example ready to debug this, because it requires quite a bit of work. If there's anything I can do to help you track down the source of this error, let me know.

GuillaumeLeclerc commented 2 years ago

It's a numba issue and there is nothing FFCV can do to fix their cleanup procedure :(