libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.86k stars 180 forks source link

Segmentation fault when running the data loader #109

Closed YuhengHuang42 closed 2 years ago

YuhengHuang42 commented 2 years ago

Hi, I'm using FFCV to run my own dataset. The DatasetWriter works just fine, however, I encountered Segmentation fault while using ffcv.loader.

The process is just like this:

from ffcv.loader import Loader, OrderOption
from ffcv.transforms import ToTensor, ToDevice, ToTorchImage, Cutout
from ffcv.fields.decoders import IntDecoder, RandomResizedCropRGBImageDecoder
import faulthandler

# Data decoding and augmentation
image_pipeline = [ToTensor()]

# Pipeline for each data field
pipelines = {
    'image': image_pipeline,
}

bs = 2
num_workers=1
loader = Loader(write_path, batch_size=bs, num_workers=num_workers, os_cache=False,
                order=OrderOption.SEQUENTIAL, pipelines=pipelines)

And the trace of the error:

Current thread 0x00007f421affd700 (most recent call first):
  File "/lib/python3.9/site-packages/ffcv/loader/epoch_iterator.py", line 134 in run_pipeline
  File "/lib/python3.9/site-packages/ffcv/loader/epoch_iterator.py", line 80 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f4248ff8700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f42497f9700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f4249ffa700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f424a7fb700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f424affc700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f424b7fd700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f424bffe700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f4260c78700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f4261479700 (most recent call first):
  File "/lib/python3.9/site-packages/ffcv/libffcv.py", line 14 in read
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 33 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f4261c7a700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f426247b700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f4262c7c700 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/memory_managers/process_cache/page_reader.py", line 26 in run
  File "/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f4427476740 (most recent call first):
  File "/lib/python3.9/threading.py", line 312 in wait
  File "/lib/python3.9/queue.py", line 171 in get
  File "/lib/python3.9/site-packages/ffcv/loader/epoch_iterator.py", line 142 in __next__
  File "debug.py", line 22 in <module>
[1]    197770 segmentation fault (core dumped)  python3 debug.py

I also noticed that the error may be related to OrderOption. For OrderOption. QUASI_RANDOM the loader will fail after 3 batches of data is returned, while for OrderOption.RANDOM it will fail after about 50+ batches of data is returned.

GuillaumeLeclerc commented 2 years ago

Hello!

Thank you for the report. What happens when you use os_cache = True does it also fail ?

Could you give a reduced version of your dataset that I could play with ?

One way to debug these kind of errors is to disable compilation. It will show the origin of the segmentation fault:

from ffcv.compiler import Compiler
Compiler.set_enable(False)

(the code will obviously run much slower)

GuillaumeLeclerc commented 2 years ago

Closing due to inactivity. Feel free to reopen once you have more information.

daixiangzi commented 12 months ago

+1,I meet same problem