Example from doc hangs on enumerate on CPU only machines

alcinos commented 2 years ago

Hello,

I am trying to run the simple example provided in the documentation. I have created a fresh conda env as advised. Note that I'm testing on a host without a gpu. Relevant info:

Environment info

``` PyTorch version: 1.10.2 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (GCC) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31 Python version: 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-4.18.0-305.28.1.el8_4.x86_64-x86_64-with-glibc2.31 Is CUDA available: False CUDA runtime version: 11.3.58 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.5 [pip3] pytorch-pfn-extras==0.5.6 [pip3] torch==1.10.2 [pip3] torchvision==0.11.3 [conda] blas 2.113 mkl conda-forge [conda] blas-devel 3.9.0 13_linux64_mkl conda-forge [conda] cudatoolkit 11.3.1 ha36c431_10 conda-forge [conda] libblas 3.9.0 13_linux64_mkl conda-forge [conda] libcblas 3.9.0 13_linux64_mkl conda-forge [conda] liblapack 3.9.0 13_linux64_mkl conda-forge [conda] liblapacke 3.9.0 13_linux64_mkl conda-forge [conda] mkl 2022.0.1 h8d4b97c_803 conda-forge [conda] mkl-devel 2022.0.1 ha770c72_804 conda-forge [conda] mkl-include 2022.0.1 h8d4b97c_803 conda-forge [conda] mypy-extensions 0.4.3 pypi_0 pypi [conda] numpy 1.21.5 py39haac66dc_0 conda-forge [conda] pytorch 1.10.2 py3.9_cuda11.3_cudnn8.2.0_0 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] pytorch-pfn-extras 0.5.6 pypi_0 pypi [conda] torchvision 0.11.3 py39_cu113 pytorch ```

For reference, here is the full code I'm running (taken from the documentation)

Demo code

```python from ffcv.writer import DatasetWriter import numpy as np from ffcv.fields import NDArrayField, FloatField from ffcv.loader import Loader, OrderOption from ffcv.fields.decoders import NDArrayDecoder, FloatDecoder from ffcv.loader import OrderOption from ffcv.transforms import ToTensor class LinearRegressionDataset: def __init__(self, N, d): self.X = np.random.randn(N, d) self.Y = np.random.randn(N) def __getitem__(self, idx): return (self.X[idx].astype('float32'), self.Y[idx]) def __len__(self): return len(self.X) N, d = (10, 6) dataset = LinearRegressionDataset(N, d) writer = DatasetWriter("/tmp/new.beton", { 'covariate': NDArrayField(shape=(d,), dtype=np.dtype('float32')), 'label': FloatField(), }, num_workers=16) writer.from_indexed_dataset(dataset) loader = Loader('/tmp/new.beton', batch_size=2, num_workers=1, order=OrderOption.RANDOM, pipelines={ 'covariate': [NDArrayDecoder(), ToTensor()], 'label': [FloatDecoder(), ToTensor()] }) print(len(loader)) for l in loader: print(l) ```

The printed length is correct (5), however the code completely freezes when hitting the for loop. Ctrl+C doesn't work, suggesting that the issue is a multi-process one.

top shows 0% cpu usage, and the presence of a process launched with python -c from multiprocessing.resource_tracker import main;main(7)

Let me know if you need additional debugging information (I'm not sure how to obtain a trace-back in this case, ideas welcome...)

Best

alcinos commented 2 years ago

Update: I ran the exact same test, with the exact same environment, on a host with access to a gpu, and it worked as expected.

So it seems that the issue occurs on cpu only machines. This is confusing to me, since nothing in the pipeline is related to CUDA, as far as I can tell. Is this a bug or a feature?

GuillaumeLeclerc commented 2 years ago

FFCV doesn't spawn sub-processes so it can't be it.

What is the output of torch.cuda.is_available() in your environment. I fear that it might be that it returns True even though you don't have a GPU. I didn't think that scenario would happen. Why is cuda installed on your machine if you don't have a GPU ?

alcinos commented 2 years ago

torch.cuda.is_available() returns False, on the CPU only machine, as it should.

As for the why, we have a cluster with some test machines that are CPU only and easier to access than the GPU ones. My debugging process is to get the code working on these machines first, using the same environment that is going to be used for actual training (with the caveat that for pytorch training, the device must be set to cpu), then test on a GPU machine, and finally submit the job.

GuillaumeLeclerc commented 2 years ago

I can confirm the bug is reproducible. I will fix and incorporate this in the next version.

Thank you for the report @alcinos!

GuillaumeLeclerc commented 2 years ago

It's a pytorch issue. I'll file a ticket.

GuillaumeLeclerc commented 2 years ago

@alcinos pytorch issues usually take a long time to reach a release so I think it won't be fixed in next FFCV's release but as soon as pytorch fixes it it will immediately be resolved.

libffcv / ffcv

Example from doc hangs on enumerate on CPU only machines #148