Open jamespinkerton opened 5 days ago
Thanks for using LightGBM.
Would it be possible to deploy the CUDA version to conda forge?
I'm assuming you mean the Python package (because you mentioned pip
), and not other components like the CLI.
As described in https://github.com/microsoft/LightGBM/blob/master/python-package/README.rst#install-from-conda-forge-channel, on Linux systems with CUDA, the following will select a CUDA-enabled build of lightgbm
.
conda install -c conda-forge 'lightgbm>=4.4.0'
The CUDA build isn't supported on Windows or macOS.
If this doesn't answer your question, please provide more specifics about what you'd like to do and what "can be difficult" means, and we will try to help.
I think this does answer my question. I believe the conda build automatically downloading a cuda-enabled build is new to a recent version? The last time I tried this I though the conda build did not come with a cuda-enabled version. Thanks for the answer!
Hi. I just tried running on an A100. I'm using the python package version 4.5.0 on a debian linux machine. The A100 is an Google Cloud, cuda version 12.1.
I got the following errors:
[LightGBM] [Warning] Using sparse features with CUDA is currently not supported.
[LightGBM] [Warning] Defaulting to malloc in CHAllocator!!!
[LightGBM] [Fatal] [CUDA] initialization error /home/conda/feedstock_root/build_artifacts/lightgbm_1722621077976/work/src/io/cuda/cuda_column_data.cpp 16
You mentioned that you'd previously tried to build LightGBM from source... are you certain you've uninstalled it?
pip uninstall lightgbm
Can you share the output of the following commands?
nvidia-smi
conda info
conda env export
python --version
Are you able to provide a minimal, reproducible example that demonstrates this error? For example, is this sufficient to reproduce it?
from sklearn.datasets import make_regression
import lightgbm as lgb
X, y = make_regression(n_samples=10_000)
dtrain = lgb.Dataset(X, label=y)
bst = lgb.train(
train_set=dtrain,
params={
"device": "cuda",
"objective": "regression",
"num_leaves": 7
},
num_boost_round=5
)
Yes, I'm quite confident it's been uninstalled. This was several months ago, and I have since re-installed the conda environment.
(base) james_c_pinkerton@workergpu:~$ nvidia-smi
Mon Sep 30 04:08:01 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB Off | 00000000:00:05.0 Off | 0 |
| N/A 46C P0 161W / 400W | 6503MiB / 81920MiB | 56% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB Off | 00000000:00:06.0 Off | 0 |
| N/A 47C P0 233W / 400W | 1731MiB / 81920MiB | 43% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 373258 C python 6490MiB |
| 1 N/A N/A 373258 C python 1718MiB |
+---------------------------------------------------------------------------------------+
(base) james_c_pinkerton@workergpu:~$ /mnt/disks/condaman/mamba/bin/conda info
active environment : /opt/conda
active env location : /opt/conda
shell level : 1
user config file : /home/james_c_pinkerton/.condarc
populated config files : /mnt/disks/condaman/mamba/.condarc
/opt/conda/.condarc
conda version : 24.7.1
conda-build version : not installed
python version : 3.12.0.final.0
solver : libmamba (default)
virtual packages : __archspec=1=cascadelake
__conda=24.7.1=0
__cuda=12.2=0
__glibc=2.31=0
__linux=5.10.0=0
__unix=0=0
base environment : /mnt/disks/condaman/mamba (read only)
conda av data dir : /mnt/disks/condaman/mamba/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /mnt/disks/condaman/mamba/pkgs
/home/james_c_pinkerton/.conda/pkgs
envs directories : /home/james_c_pinkerton/.conda/envs
/mnt/disks/condaman/mamba/envs
platform : linux-64
user-agent : conda/24.7.1 requests/2.32.3 CPython/3.12.0 Linux/5.10.0-27-cloud-amd64 debian/11.8 glibc/2.31 solver/libmamba conda-libmamba-solver/24.1.0 l
ibmambapy/1.5.8
UID:GID : 1008:1009
netrc file : None
offline mode : False
(base) james_c_pinkerton@workergpu:~$ /mnt/disks/condaman/mamba/bin/conda env export 00:10:28 [236/771]
name: /opt/conda
channels:
- file:///tmp/conda-pkgs
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- anyio=4.2.0=pyhd8ed1ab_0
- archspec=0.2.2=pyhd8ed1ab_0
- argon2-cffi=23.1.0=pyhd8ed1ab_0
- argon2-cffi-bindings=21.2.0=py310h2372a71_4
- arrow=1.3.0=pyhd8ed1ab_0
- asttokens=2.4.1=pyhd8ed1ab_0
- attrs=23.2.0=pyh71513ae_0
- beautifulsoup4=4.12.2=pyha770c72_0
- bleach=6.1.0=pyhd8ed1ab_0
- boltons=23.1.1=pyhd8ed1ab_0
- brotli-python=1.1.0=py310hc6cd4ac_1
- bzip2=1.0.8=hd590300_5
- c-ares=1.25.0=hd590300_0
- ca-certificates=2023.11.17=hbcca054_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- certifi=2023.11.17=pyhd8ed1ab_0
- cffi=1.16.0=py310h2fee648_0
- charset-normalizer=3.3.2=pyhd8ed1ab_0
- colorama=0.4.6=pyhd8ed1ab_0
- comm=0.2.1=pyhd8ed1ab_0
- conda=23.11.0=py310hff52083_1
- conda-libmamba-solver=23.12.0=pyhd8ed1ab_0
- conda-package-handling=2.2.0=pyh38be061_0
- conda-package-streaming=0.9.0=pyhd8ed1ab_0
- debugpy=1.8.0=py310hc6cd4ac_1
- decorator=5.1.1=pyhd8ed1ab_0
- defusedxml=0.7.1=pyhd8ed1ab_0
- distro=1.9.0=pyhd8ed1ab_0
- dlenv-base=1.0.20240112=py310_0
- entrypoints=0.4=pyhd8ed1ab_0
- exceptiongroup=1.2.0=pyhd8ed1ab_2
- executing=2.0.1=pyhd8ed1ab_0
- fmt=10.1.1=h00ab1b0_1
- fqdn=1.5.1=pyhd8ed1ab_0
- gmp=6.3.0=h59595ed_0
- icu=73.2=h59595ed_0
- idna=3.6=pyhd8ed1ab_0
- importlib_metadata=7.0.1=hd8ed1ab_0
- importlib_resources=6.1.1=pyhd8ed1ab_0
- ipykernel=6.28.0=pyhd33586a_0
- ipython=8.20.0=pyh707e725_0
- ipython_genutils=0.2.0=py_1
- isoduration=20.11.0=pyhd8ed1ab_0
- jedi=0.19.1=pyhd8ed1ab_0
- jinja2=3.1.3=pyhd8ed1ab_0
- jsonpatch=1.33=pyhd8ed1ab_0
- jsonpointer=2.4=py310hff52083_3
- jsonschema=4.20.0=pyhd8ed1ab_0
- jsonschema-specifications=2023.12.1=pyhd8ed1ab_0
- jsonschema-with-format-nongpl=4.20.0=pyhd8ed1ab_0
- jupyter_client=8.6.0=pyhd8ed1ab_0
- jupyter_core=5.7.1=py310hff52083_0
- jupyter_events=0.9.0=pyhd8ed1ab_0
- jupyter_server=2.12.4=pyhd8ed1ab_0
- jupyter_server_terminals=0.5.1=pyhd8ed1ab_0
- jupyterlab_pygments=0.3.0=pyhd8ed1ab_0
- keyutils=1.6.1=h166bdaf_0
- krb5=1.21.2=h659d440_0
- ld_impl_linux-64=2.40=h41732ed_0
- libarchive=3.7.2=h2aa1ff5_1
- libcurl=8.5.0=hca28451_0
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=hd590300_2
- libffi=3.4.2=h7f98852_5
- libgcc-ng=13.2.0=h807b86a_3
- libgomp=13.2.0=h807b86a_3
- libiconv=1.17=hd590300_2
- libmamba=1.5.6=had39da4_0
- libmambapy=1.5.6=py310h39ff949_0
- libnghttp2=1.58.0=h47da74e_1
- libnsl=2.0.1=hd590300_0
- libsodium=1.0.18=h36c2ea0_1
- libsolv=0.7.27=hfc55251_0
- libsqlite=3.44.2=h2797004_0
- libssh2=1.11.0=h0841786_0
- libstdcxx-ng=13.2.0=h7e041cc_3
- libuuid=2.38.1=h0b41bf4_0
- libuv=1.46.0=hd590300_0
- libxcrypt=4.4.36=hd590300_1
- libxml2=2.12.3=h232c23b_0
- libzlib=1.2.13=hd590300_5
- lz4-c=1.9.4=hcb278e6_0
- lzo=2.10=h516909a_1000
- markupsafe=2.1.3=py310h2372a71_1
- matplotlib-inline=0.1.6=pyhd8ed1ab_0
- menuinst=2.0.1=py310hff52083_0
- mistune=3.0.2=pyhd8ed1ab_0
- nb_conda=2.2.1=unix_7
- nb_conda_kernels=2.3.1=pyhd8ed1ab_3
- nbclassic=1.0.0=pyhb4ecaf3_1
- nbconvert=7.14.1=pyhd8ed1ab_0
- nbconvert-core=7.14.1=pyhd8ed1ab_0
- nbconvert-pandoc=7.14.1=pyhd8ed1ab_0
- nbformat=5.9.2=pyhd8ed1ab_0
- ncurses=6.4=h59595ed_2
- nest-asyncio=1.5.8=pyhd8ed1ab_0
- nodejs=20.9.0=hb753e55_0
- notebook-shim=0.2.3=pyhd8ed1ab_0
- openssl=3.2.0=hd590300_1
- overrides=7.4.0=pyhd8ed1ab_0
- packaging=23.2=pyhd8ed1ab_0
- pandoc=3.1.3=h32600fe_0
- pandocfilters=1.5.0=pyhd8ed1ab_0
- parso=0.8.3=pyhd8ed1ab_0
- pickleshare=0.7.5=py_1003
- pip=23.3.2=pyhd8ed1ab_0
- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
- pluggy=1.3.0=pyhd8ed1ab_0
- prometheus_client=0.19.0=pyhd8ed1ab_0
- ptyprocess=0.7.0=pyhd3deb0d_0
- pure_eval=0.2.2=pyhd8ed1ab_0
- pybind11-abi=4=hd8ed1ab_3
- pycosat=0.6.6=py310h2372a71_0
- pycparser=2.21=pyhd8ed1ab_0
- pygments=2.17.2=pyhd8ed1ab_0
- pysocks=1.7.1=pyha2e5f31_6
- python=3.10.13=hd12c33a_1_cpython
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-fastjsonschema=2.19.1=pyhd8ed1ab_0
- python-json-logger=2.0.7=pyhd8ed1ab_0
- python_abi=3.10=4_cp310
- pyyaml=6.0.1=py310h2372a71_1
- readline=8.2=h8228510_1
- referencing=0.32.1=pyhd8ed1ab_0
- reproc=14.2.4.post0=hd590300_1
- reproc-cpp=14.2.4.post0=h59595ed_1
- requests=2.31.0=pyhd8ed1ab_0
- rfc3339-validator=0.1.4=pyhd8ed1ab_0
- rfc3986-validator=0.1.1=pyh9f0ad1d_0
- rpds-py=0.16.2=py310hcb5633a_0
- ruamel.yaml=0.18.5=py310h2372a71_0 (2 results) 00:10:28 [96/771]
- ruamel.yaml.clib=0.2.7=py310h2372a71_2
- send2trash=1.8.2=pyh41d4057_0
- setuptools=69.0.3=pyhd8ed1ab_0
- six=1.16.0=pyh6c4a22f_0
- sniffio=1.3.0=pyhd8ed1ab_0
- soupsieve=2.5=pyhd8ed1ab_1
- stack_data=0.6.2=pyhd8ed1ab_0
- terminado=0.18.0=pyh0d859eb_0
- tinycss2=1.2.1=pyhd8ed1ab_0
- tk=8.6.13=noxft_h4845f30_101
- tornado=6.3.3=py310h2372a71_1
- tqdm=4.66.1=pyhd8ed1ab_0
- traitlets=5.9.0=pyhd8ed1ab_0
- truststore=0.8.0=pyhd8ed1ab_0
- types-python-dateutil=2.8.19.20240106=pyhd8ed1ab_0
- typing-extensions=4.9.0=hd8ed1ab_0
- typing_extensions=4.9.0=pyha770c72_0
- typing_utils=0.1.0=pyhd8ed1ab_0
- uri-template=1.3.0=pyhd8ed1ab_0
- wcwidth=0.2.13=pyhd8ed1ab_0
- webcolors=1.13=pyhd8ed1ab_0
- webencodings=0.5.1=pyhd8ed1ab_2
- websocket-client=1.7.0=pyhd8ed1ab_0
- wheel=0.42.0=pyhd8ed1ab_0
- xz=5.2.6=h166bdaf_0
- yaml=0.2.5=h7f98852_2
- yaml-cpp=0.8.0=h59595ed_0
- zeromq=4.3.5=h59595ed_0
- zipp=3.17.0=pyhd8ed1ab_0
- zlib=1.2.13=hd590300_5
- zstandard=0.22.0=py310h1275a96_0
- zstd=1.5.5=hfc55251_0
- pip:
- absl-py==2.0.0
- aiofiles==22.1.0
- aiohttp==3.9.1
- aiohttp-cors==0.7.0
- aiorwlock==1.3.0
- aiosignal==1.3.1
- aiosqlite==0.19.0
- annotated-types==0.6.0
- async-timeout==4.0.3
- babel==2.14.0
- backoff==2.2.1
- beatrix-jupyterlab==2023.128.151533
- blessed==1.20.0
- cachetools==5.3.2
- click==8.1.7
- cloud-tpu-client==0.10
- cloudpickle==3.0.0
- colorful==0.5.6
- contourpy==1.2.0
- cryptography==41.0.7
- cycler==0.12.1
- cython==3.0.8
- dacite==1.8.1
- dataproc-jupyter-plugin==0.1.66
- db-dtypes==1.2.0
- deprecated==1.2.14
- distlib==0.3.8
- dm-tree==0.1.8
- docker==7.0.0
- docstring-parser==0.15
- farama-notifications==0.0.4
- fastapi==0.109.0
- filelock==3.13.1
- fonttools==4.47.2
- frozenlist==1.4.1
- fsspec==2023.12.2
- gcsfs==2023.12.2.post1
- gitdb==4.0.11
- gitpython==3.1.41
- google-api-core==1.34.0
- google-api-python-client==1.8.0
- google-auth==2.26.2
- google-auth-httplib2==0.2.0
- google-auth-oauthlib==1.2.0
- google-cloud-aiplatform==1.39.0
- google-cloud-artifact-registry==1.10.0
- google-cloud-bigquery==3.15.0
- google-cloud-bigquery-storage==2.24.0
- google-cloud-core==2.4.1
- google-cloud-datastore==1.15.5
- google-cloud-jupyter-config==0.0.5
- google-cloud-language==2.12.0
- google-cloud-monitoring==2.18.0
- google-cloud-resource-manager==1.11.0
- google-cloud-storage==2.14.0
- google-crc32c==1.5.0
- google-resumable-media==2.7.0
- googleapis-common-protos==1.62.0
- gpustat==1.0.0
- greenlet==3.0.3
- grpc-google-iam-v1==0.13.0
- grpcio==1.60.0
- grpcio-status==1.48.2
- gymnasium==0.28.1
- h11==0.14.0
- htmlmin==0.1.12
- httplib2==0.22.0
- httptools==0.6.1
- imagehash==4.3.1
- imageio==2.33.1
- importlib-metadata==6.11.0
- ipython-genutils==0.2.0
- ipython-sql==0.5.0
- ipywidgets==8.1.1
- jaraco-classes==3.3.0
- jax-jumpy==1.0.0
- jeepney==0.8.0
- joblib==1.3.2
- json5==0.9.14
- jupyter-client==7.4.9
- jupyter-http-over-ws==0.0.8
- jupyter-server-fileid==0.9.1
- jupyter-server-mathjax==0.2.6
- jupyter-server-proxy==4.1.0
- jupyter-server-ydoc==0.8.0
- jupyter-ydoc==0.2.5
- jupyterlab==3.6.6
- jupyterlab-git==0.44.0
- jupyterlab-server==2.25.2
- jupyterlab-widgets==3.0.9
- jupytext==1.16.0
- kernels-mixer==0.0.7
- keyring==24.3.0
- keyrings-google-artifactregistry-auth==1.1.2
- kfp==2.5.0
- kfp-pipeline-spec==0.2.2
- kfp-server-api==2.0.5
- kiwisolver==1.4.5
- kubernetes==26.1.0
- lazy-loader==0.3
- llvmlite==0.41.1
- lz4==4.3.3
- markdown-it-py==3.0.0
- matplotlib==3.8.2
- mdit-py-plugins==0.4.0
- mdurl==0.1.2
- more-itertools==10.2.0
- msgpack==1.0.7
- multidict==6.0.4
- multimethod==1.10
- nbclient==0.9.0
- nbdime==3.2.0
- networkx==3.2.1
- notebook==6.5.6
- notebook-executor==0.2
- numba==0.58.1
- numpy==1.25.2
- nvidia-ml-py==11.495.46
- oauth2client==4.1.3
- oauthlib==3.2.2
- opencensus==0.11.4
- opencensus-context==0.1.3
- opentelemetry-api==1.22.0
- opentelemetry-exporter-otlp==1.22.0
- opentelemetry-exporter-otlp-proto-common==1.22.0
- opentelemetry-exporter-otlp-proto-grpc==1.22.0
- opentelemetry-exporter-otlp-proto-http==1.22.0
- opentelemetry-proto==1.22.0
- opentelemetry-sdk==1.22.0
- opentelemetry-semantic-conventions==0.43b0
- pandas==2.1.4
- pandas-profiling==3.6.6
- papermill==2.5.0
- patsy==0.5.6
- pexpect==4.9.0
- phik==0.12.4
- pillow==10.2.0
- platformdirs==3.11.0
- plotly==5.18.0
- prettytable==3.9.0
- prompt-toolkit==3.0.43
- proto-plus==1.23.0
- protobuf==3.20.3
- psutil==5.9.3
- py-spy==0.3.14
- pyarrow==14.0.2
- pyasn1==0.5.1
- pyasn1-modules==0.3.0
- pydantic==2.5.3
- pydantic-core==2.14.6
- pyjwt==2.8.0
- pyparsing==3.1.1
- python-dotenv==1.0.0
- pytz==2023.3.post1
- pywavelets==1.5.0
- pyzmq==24.0.1
- ray==2.9.0
- ray-cpp==2.9.0
- requests-oauthlib==1.3.1
- requests-toolbelt==0.10.1
- retrying==1.3.4
- rich==13.7.0
- scikit-image==0.22.0
- scikit-learn==1.3.2
- scipy==1.11.4
- seaborn==0.12.2
- secretstorage==3.3.3
- shapely==2.0.2
- simpervisor==1.0.0
- smart-open==6.4.0
- smmap==5.0.1
- sqlalchemy==2.0.25
- sqlparse==0.4.4
- stack-data==0.6.3
- starlette==0.35.1
- statsmodels==0.14.1
- tabulate==0.9.0
- tangled-up-in-unicode==0.2.0
- tenacity==8.2.3
- tensorboardx==2.6.2.2
- threadpoolctl==3.2.0
- tifffile==2023.12.9
- toml==0.10.2
- tomli==2.0.1
- typeguard==4.1.5
- typer==0.9.0
- tzdata==2023.4
- uritemplate==3.0.1
- urllib3==1.26.18
- uvicorn==0.25.0
- uvloop==0.19.0
- virtualenv==20.21.0
- visions==0.7.5
- watchfiles==0.21.0
- websockets==12.0
- widgetsnbextension==4.0.9
- wordcloud==1.9.3
- wrapt==1.16.0
- y-py==0.6.2
- yarl==1.9.4
- ydata-profiling==4.6.4
- ypy-websocket==0.8.4
prefix: /opt/conda
(base) james_c_pinkerton@workergpu:~$ /mnt/disks/condaman/mamba/bin/python --version
Python 3.12.0
Your program did not produce those errors.
Ah I've figured it out! This is really embarrassing. I use python multiprocessing. When you launch launch lightgbm from a spawned process in python, it crashes cuda. I should have realized this because pytorch causes the same issues
I have one last question for this thread, the parameter "data_sample_strategy": "goss" crashes the run in cuda mode? Is this known?
I get this error:
[LightGBM] [Fatal] [CUDA] invalid argument /home/conda/feedstock_root/build_artifacts/lightgbm_1722621077976/work/src/boosting/goss.hpp 63
Hi. I’m having a lot of trouble building and deploying a CUDA version of LGBM in my conda environment. When you source everything from conda forge, it can be difficult to try to integrate a pip version of LGBM.
Would it be possible to deploy the CUDA version to conda forge?
Thanks