I am using the deepquestai/deepstack:gpu-2022.01.1 container to do custom training. It comes with torch for cuda 11.3 but train.py fails after initiation (see error below). This is resolved when I downgrade to torch for cuda 11.0 (pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html as per the collab notebook).
docker run --gpus all -it --rm -v /home/eouser/deepstack:/deepstack/code -w /deepstack/code/deepstack-trainer deepquestai/deepstack_updated:gpu python3 train.py --dataset-path /deepstack/code/data
Traceback (most recent call last):
File "train.py", line 530, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 90, in train
model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc).to(device) # create
File "/deepstack/code/deepstack-trainer/models/yolo.py", line 96, in init
self._initialize_biases() # only run once
File "/deepstack/code/deepstack-trainer/models/yolo.py", line 151, in _initialize_biases
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
I first need to downgrade setuptools inside the container, btw, because otherwise it throws:
Traceback (most recent call last):
File "train.py", line 21, in
from torch.utils.tensorboard import SummaryWriter
File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/init.py", line 4, in
LooseVersion = distutils.version.LooseVersion
AttributeError: module 'setuptools._distutils' has no attribute 'version'
(resolved with: pip install setuptools==59.5.0)
I am now happily training with the revised setup, so nothing too urgent, but maybe worth checking out.
Hi,
I am using the deepquestai/deepstack:gpu-2022.01.1 container to do custom training. It comes with torch for cuda 11.3 but train.py fails after initiation (see error below). This is resolved when I downgrade to torch for cuda 11.0 (
pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
as per the collab notebook).docker run --gpus all -it --rm -v /home/eouser/deepstack:/deepstack/code -w /deepstack/code/deepstack-trainer deepquestai/deepstack_updated:gpu python3 train.py --dataset-path /deepstack/code/data
Traceback (most recent call last): File "train.py", line 530, inI first need to downgrade setuptools inside the container, btw, because otherwise it throws:
Traceback (most recent call last): File "train.py", line 21, in
from torch.utils.tensorboard import SummaryWriter
File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/init.py", line 4, in
LooseVersion = distutils.version.LooseVersion
AttributeError: module 'setuptools._distutils' has no attribute 'version'
(resolved with:
pip install setuptools==59.5.0
)I am now happily training with the revised setup, so nothing too urgent, but maybe worth checking out.
Thx for this wonderful framework!
Guido