jrzaurin / pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Apache License 2.0
1.3k stars 190 forks source link

Cuda Version #206

Closed ECUST-Zhang closed 7 months ago

ECUST-Zhang commented 7 months ago

When I installed PyTorch-Widedeep directly, I installed CUDA12 and TORCH2.2.1 by default, but the device only supports CUDA11. After I re -reported errors after re -accordance with CUDA11 and Torch2.0.1:

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

Can I install the pytorch-widedeep of CUDA11 directly

jrzaurin commented 7 months ago

Hey @ECUST-Zhang

Thanks for opening the issue

you could look in this link or this one.

You could:

  1. Install torch and torchvision with cuda 11.XX support in a clean environment.
  2. Install pytorch-widedeep via pip. In pple the requirements are:
    torch >= 2.0.0
    torchvision >= 0.15.0

    and there is cuda 11 support for these versions, so these libraries should not be re-installed. And this should work

Otherwise you can clone this repo, edit the requirements to the torch and torchvision version you need, and install from there:

# Clone the repository
git clone https://github.com/jrzaurin/pytorch-widedeep
cd pytorch-widedeep

# Edit requirements and Install 
pip install  .

Let me know if you have more issues :)

ECUST-Zhang commented 7 months ago

Hey @jrzaurin Thanks for your reply, but I still have some questions By first installing cuda11, and then downloading the github project, pytorch-widedeep was successfully installed, but errors were still reported during model training or prediction. image

ECUST-Zhang commented 7 months ago

image image

Hey @jrzaurin Thanks for your reply, but I still have some questions By first installing cuda11, and then downloading the github project, pytorch-widedeep was successfully installed, but errors were still reported during model training or prediction. image

jrzaurin commented 7 months ago

We always have lots of issues with Conda... @5uperpalo can you have a look? I don't use conda, although this might just be CUDA...

In the meantime, can you look here see if it helps?

maybe this is all you need

import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

It might be something related to TabNet. I have not touched the code for that model in ages.