Closed hedongyan closed 1 year ago
The solution is to downgrade the pynvml
package, but I am not sure to which version exactly.
Could you try pip install pynvml==11.0.0
and share if it solves the issue?
I try the following code at base environment,
pip install pynvml==11.0.0
,
python bin/tune.py exp/mlp/california/1_tuning
: ModuleNotFoundError: No module named 'optuna',
pip uninstall pynvml==11.0.0
,
Then I activate the num-embeddings,
conda activate num-embeddings
: WARNING: overwriting environment variables set in the machine, overwriting variable LD_LIBRARY_PATH,
export CUDA_VISIBLE_DEVICES="0"
,
cp exp/mlp/california/0_tuning.toml exp/mlp/california/1_tuning.toml
,
python bin/tune.py exp/mlp/california/1_tuning.toml
:
[output] exp/mlp/california/1_tuning The output directory already exists. Done!
Actually, I am not sure which step really worked.
Thank you!
The command pip install pynvml==11.0.0
should be executed in the num-embeddings
environment.
The reason why your command did not fail this time was given in the output: "The output directory already exists. Done!". In other words, the script immediately exists after you run it, because it sees the result directory after the previous unsuccessfull run and thinks that there is no job to do.
You have either to remove the directory before the run:
rm -r exp/mlp/california/1_tuning
or use the --force
flag:
python bin/tune.py exp/mlp/california/1_tuning.toml --force
Please, let us know if it works and you can successfully run the bin/tune.py
script and see that it is doing something?
Yes. It worked this time. The code was performed. Thank you!
Great! Closing the issue then :)
Following the readme.txt, I run the program and got such feedback.
Here was the output of python bin/tune.py exp/mlp/california/1_tuning.toml.
Creating the output... Traceback (most recent call last): File "bin/tune.py", line 37, in <module> C, output, report = lib.start(Config) File "embedding/lib/util.py", line 271, in start 'gpus': zero.hardware.get_gpus_info(), File ".conda/envs/num-embeddings/lib/python3.9/site-packages/zero/hardware.py", line 88, in get_gpus_info 'name': str(pynvml.nvmlDeviceGetName(handle), 'utf-8'), TypeError: decoding str is not supported
I did not find the lib.start function. Was it a bug? How to fix it?