nnaisense / evotorch

Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.
https://evotorch.ai
Apache License 2.0
1.01k stars 63 forks source link

SupervisedNE get_minibatch stack overflow #73

Closed jalane76 closed 1 year ago

jalane76 commented 1 year ago

Hello, I'm trying out evotorch to eventually do some black box optimization on the YOLOv8 model. However, I've run into the following error when trying to run SNES on a SupervisedNE problem on the COCO dataset.

Here is the console output:

(praeception-py3.8) jesse@caliban:~/git/praeception$ python scripts/try_evotorch_on_yolo.py 

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.Conv                  [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.Conv                  [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.C2f                   [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.C2f                   [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.C2f                   [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.SPPF                  [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.C2f                   [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.Conv                  [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.C2f                   [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 22        [15, 18, 21]  1    897664  ultralytics.nn.modules.Detect                [80, [64, 128, 256]]          
YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs

loading annotations into memory...
Done (t=9.30s)
creating index...
index created!
[2023-03-11 18:42:04] INFO     <499881> evotorch.core: Instance of `SupervisedNE` (id:140117468367792) -- The `dtype` for the problem's decision variables is set as torch.float32
[2023-03-11 18:42:04] INFO     <499881> evotorch.core: Instance of `SupervisedNE` (id:140117468367792) -- `eval_dtype` (the dtype of the fitnesses and evaluation data) is set as torch.float32
[2023-03-11 18:42:04] INFO     <499881> evotorch.core: Instance of `SupervisedNE` (id:140117468367792) -- The `device` of the problem is set as cuda:0
[2023-03-11 18:42:05] INFO     <499881> evotorch.core: Instance of `SupervisedNE` (id:140117468367792) -- The number of actors that will be allocated for parallelized evaluation is 0
Fatal Python error: Cannot recover from stack overflow.
Python runtime state: initialized

Current thread 0x00007f7039a8d740 (most recent call first):
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/PIL/ImageFile.py", line 571 in _safe_read
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/PIL/JpegImagePlugin.py", line 67 in APP
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/PIL/JpegImagePlugin.py", line 386 in _open
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/PIL/ImageFile.py", line 117 in __init__
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/PIL/JpegImagePlugin.py", line 822 in jpeg_factory
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/PIL/Image.py", line 3254 in _open_core
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/PIL/Image.py", line 3268 in open
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/torchvision/datasets/coco.py", line 41 in _load_image
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/torchvision/datasets/coco.py", line 48 in __getitem__
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 58 in <listcomp>
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 58 in fetch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 671 in _next_data
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628 in __next__
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 318 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  File "/home/jesse/.cache/pypoetry/virtualenvs/praeception--NwHCx9X-py3.8/lib/python3.8/site-packages/evotorch/neuroevolution/supervisedne.py", line 326 in get_minibatch
  ...
Aborted (core dumped)

And here is the code I'm using:

import click
import torch
from evotorch.algorithms import SNES
from evotorch.logging import StdOutLogger
from evotorch.neuroevolution import SupervisedNE
from torchvision.datasets import CocoDetection
from ultralytics import YOLO

@click.command()
@click.option("--model", default="yolov8n.yaml", help="Model config file or model path")
@click.option("--device", default="cuda:0", help="Device to use")
def main(model, device):

    yolo_model = YOLO(model=model)

    coco_path = "/data/school/coco/train2017"
    ann_path = "/data/school/coco/annotations/instances_train2017.json"

    dataset = CocoDetection(coco_path, ann_path)

    problem = SupervisedNE(
        dataset=dataset,
        network=yolo_model.model,
        minibatch_size=16,
        loss_func=torch.nn.MSELoss(),
        device=device,
        num_actors=4,
        num_gpus_per_actor=1.0 / 4.0,
    )
    searcher = SNES(problem, popsize=3, radius_init=2.25)
    _ = StdOutLogger(searcher)
    searcher.run(1)

if __name__ == "__main__":
    main()

Looking at the source code for get_minibatch in SupervisedNE it appears that if the batch returns None or there is an exception then get_minibatch is called again. This would appear to lead to an infinite regress until overflow and no underlying error is thrown to let me know what the problem is. I tried it on a few other datasets as well and got the same result.

Thank you!

maulberto3 commented 1 year ago

Hi @jalane76, the dataset that SupervisedNE expects is one that is fully compatible with pytorch's DataLoader. Have you tried something like this:

sample = next(iter(torch.utils.data.DataLoader(dataset, batch_size=32)))

And it works?

If it doesn't, you might get a recursive error, as per the source code:

def get_minibatch(self) -> Any:
        ...
        try:
            batch = next(self.dataloader_iterator)
            if batch is None:
                self.dataloader_iterator = iter(self.dataloader)
                batch = self.get_minibatch()
jalane76 commented 1 year ago

Thank you for your reply. I've just now gotten back to this issue.

It turns out that I was indeed having difficulties with the first dataloader. I have a fixed dataloader that is producing sensible output when I get the next batch. The same dataloader is being used by the YOLO trainer to train the model using their built-in scripts.

Unfortunately, I still get a similar overflow error when trying to run SNES on a SupervisedNE problem using this dataloader. I've done some additional testing and it seems like the end of my dataset is being reached and throwing a StopIteration (which appears to be expected behavior from a PyTorch DataLoader). I'm not entirely sure that this is the exact error occuring when I use EvoTorch because I don't actually see an error, most likely due to the stack trace not being reliable because of the stack overflow.

So, I'm still having trouble using SupervisedNE. In addition, I'm having a hard time debugging because it appears that overflowing the stack upon an error is how the get_minibatch function has been designed. I question the wisdom of this choice since

  1. It overflows the stack causing a core dump.
  2. It makes it more difficult to debug.

Maybe there is a good reason you've made this design choice, but I'm having trouble understanding it. It seems like you could do a simple check for a null dataloader_iterator and allow the exception catching to propagate or raise whatever errors you deem appropriate immediately instead of recursively calling get_minibatch.

I am not very experienced with PyTorch DataLoaders, so maybe I am still making a mistake somewhere. Any further suggestions you may have would be great. Thanks.

engintoklu commented 1 year ago

Hello @jalane76, and thank you for raising this issue! Also, thank you @maulberto3 for your helpful remarks!

We just made a new branch named fix/supervisedne (https://github.com/nnaisense/evotorch/tree/fix/supervisedne), where the get_minibatch() method does not use recursion to handle the end of data loader's minibatches. Would you like to try installing EvoTorch from that branch and try your example script again?

Also, in this new branch, you might want to take a look at the updated Training_MNIST30K.ipynb example where we changed the algorithm to PGPE and adopted hyperparameters from what we reported in the technical report. Perhaps you might want to start configuring your algorithm from there.

A few comments regarding your example code:

from torchvision import transforms

...

dataset = CocoDetection(coco_path, ann_path, transform=transforms.ToTensor())

Would you like to try with these suggestions, after switching to this new branch of EvoTorch? Feel free to let me know if something is not clear.

engintoklu commented 1 year ago

Hello @jalane76!

The pull request addressing this issue just got merged. The latest state of EvoTorch with the mentioned fix can now be installed from the repository via:

pip install git+https://github.com/nnaisense/evotorch
jalane76 commented 1 year ago

Thank you so much! I had meant to get back and try out the new branch, but I've been furiously writing a prelim so haven't had time. I'll get back to this soon.