maitrix-org / llm-reasoners

A library for advanced large language model reasoning
https://www.llm-reasoners.net/
Apache License 2.0
1.18k stars 97 forks source link

Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3 #61

Open nico1995lee opened 5 months ago

nico1995lee commented 5 months ago
TypeError: Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3
[2024-04-12 07:26:48,924] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1263) of binary: /mlsteam/data/LLM/llama/venv/bin/python
Traceback (most recent call last):
  File "/mlsteam/data/LLM/llama/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
examples/rap_gsm8k/inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-12_07:26:48
  host      : 8dede9e2fb55
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1263)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Ber666 commented 5 months ago

Hi, I tried fixing this error. Could you try again? Thanks.

nico1995lee commented 5 months ago

Hi, Thanks for your reply. This error has been resolved, but there are new error:

  File "/mlsteam/data/LLM/llm-reasoners/reasoners/lm/llama_2_model.py", line 146, in generate
    assert max_prompt_size <= params.max_seq_len, f"prompt length exceeds limit: {max_prompt_size} > {params.max_seq_len}"
AssertionError: prompt length exceeds limit: 2054 > 2048
[2024-04-15 04:49:16,875] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2659) of binary: /mlsteam/data/LLM/llama/venv/bin/python
Traceback (most recent call last):
  File "/mlsteam/data/LLM/llama/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
examples/rap_gsm8k/inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-15_04:49:16
  host      : 8dede9e2fb55
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2659)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Ber666 commented 5 months ago

Could you specify the script you were running, so that I can reproduce the error?

nico1995lee commented 5 months ago

I'm trying to implement RAP in the gsm8k dataset, so I executed the following command:

torchrun --nproc-per-node 1 --master-port 6676 examples/rap_gsm8k/inference.py --base_lm llama-2 --llama_2_ckpts /mlsteam/data/LLM/llama/ --llama_size 7B
Ber666 commented 5 months ago

Hi, I tried running this command, but couldn't reproduce the error... It seems to be due to an inappropriate processing of the edge case. It might be easier to debug by printing out the input and output.

Besides, I noticed that you are using llama-2 7b, which is a relatively weak model and may not follow the demonstration format. This could also cause unexpected errors. We have supported Llama-3 and you may try whether a stronger model would solve this problem.

Thanks!