Open NisuSan opened 4 months ago
Thanks for using LightGBM.
The repo you've linked shows how you installed LightGBM, but not the code you're using to run LightGBM. Can you please share the exact code you're using? Are you able to provide a reproducible example (exact code that we could run which replicates the error)?
Although you didn't say it here, I know you're using the Python package specifically because of your comments on #6325. See these links for some examples of good reproducible examples in Python: https://github.com/microsoft/LightGBM/issues/6321#issuecomment-1948512259.
It's going to be very difficult to help you given only the details you've provided so far.
Can you please share the exact code you're using?
The repo I provided has code snippet inside the README
Thanks for reporting this issue. I think it should be quick to fix. I'm trying with your example.
Update the progress here.
I've built the docker image and try to reproduce the error. But the code runs successfully within the docker container. Here's the output. I modified the code the get the training loss.
import lightgbm as lgb
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=10_000)
dtrain = lgb.Dataset(X, label=y)
dval = lgb.Dataset(X, label=y)
bst = lgb.train(
params={
"objective": "regression",
"device": "cuda",
"verbose": 1,
"metric": "l2"
},
train_set=dtrain,
valid_sets=[dval],
callbacks=[lgb.log_evaluation(period=1)],
num_boost_round=5
)
[LightGBM] [Warning] Using sparse features with CUDA is currently not supported.
[LightGBM] [Info] Total Bins 25500
[LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 100
[LightGBM] [Info] Start training from score -0.370042
[1] valid_0's l2: 22190.7
[2] valid_0's l2: 19590.2
[3] valid_0's l2: 17420.6
[4] valid_0's l2: 15575
[5] valid_0's l2: 13972.9
My GPU is V100, which is different with yours. One more thing that I would like to confirm is that, in the Dockerfile, there seems to be no LightGBM version being specified. So it builds the latest version from source. In your container where the error can be reproduced, can you provide the commit head of the LightGBM repo? That would be helpful for me to further identify the root cause.
Thanks.
@shiyu1994, Thanks for your response!
can you provide the commit head of the LightGBM repo? That would be helpful for me to further identify the root cause.
Sure, it's https://github.com/microsoft/LightGBM/commit/252828fd86627d7405021c3377534d6a8239dd69
@shiyu1994, Do you have any information about the problem?
Description
Execution of code failed with error
Reproducible example
Please, use this repo
Environment info
Docker image, based on
nvidia/cuda:12.2.0-devel-ubuntu22.04
GPU: GeForce GTX 1060 CPU: AMD Ryzen 5 1600Additional Comments
Same massage for classifiction and regression models