zhengzangw / Sequence-Scheduling

PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
61 stars 15 forks source link

Types casting error when using the demo commands. #3

Open ds-ssj opened 1 week ago

ds-ssj commented 1 week ago

Hi. When I use the following commands in README:

CUDA_VISIBLE_DEVICES=0 python -m src.benchmark --num-data 1024 --strategy seqsch --vbs --fcr --lora-path ./ckpts/vicuna-response-length-perception-module

An error accurs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/workspace/ssj/Sequence-Scheduling/src/benchmark.py", line 109, in <module>
    result = benchmark(
  File "/mnt/workspace/ssj/Sequence-Scheduling/src/benchmark.py", line 34, in benchmark
    out = model(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/ssj/Sequence-Scheduling/src/generate.py", line 254, in __call__
    out = self.generate_group(prompt, **kwargs)
  File "/mnt/workspace/ssj/Sequence-Scheduling/src/generate.py", line 185, in generate_group
    length = predictor.predict_length(
  File "/mnt/workspace/ssj/Sequence-Scheduling/src/utils.py", line 231, in predict_length
    ret = [int(s.strip()) for s in outputs]
  File "/mnt/workspace/ssj/Sequence-Scheduling/src/utils.py", line 231, in <listcomp>
    ret = [int(s.strip()) for s in outputs]
ValueError: invalid literal for int() with base 10: '100 tokens.'

I observe the values of the outputs, like this

['100', '1', '100', '100', '4', '5', '5', '100', '4', '4', '100', '100', '100', '100', '100', '1', '100', '100', '10', '100', '100', '100', '1', '3', '100', '100', '100', '100', '100', '4', '4', '10', '1', '1', '100 tokens.', '10', '100', '100', '200', '100', '10', '100', '5', '150', '140', '100', '1', '100', '4', '1', '100', '100', '100', '100', '1000', '150', '100', '100', '5', '4', '100', '10', '10', '100', '100', '1', '100', '4', '1', '100', '1', '4', '10', '10', '100', '3', '100', '100', '100', '100', '100', '100', '4', '10', '100', '1', '1', '140', '4', '5', '1', '4', '100', '500', '10', '1', '10', '5', '1', '100', '100', '1', '100', '100', '10,000 images', '5', '12', '4', '4', '4', '10', '200', '100', '4', '3', '5', '10', '1', '100', '100', '100', '4', '10', '100', '150', '10', '100', '10']

Is the outputs array is correct? I use all of config files the repo provided. The lora-path is downloaded from HF as mentioned in README.

Thanks!

zhengzangw commented 1 week ago

I think the output seems correct. Your problem seems to be a corner case that we did not enter in our experiments: our model is supposed to generate only a number, but it seems sometimes it will append some additional tokens.

For a quick fix, since it is not very open, you can do a try-except like:

ret = []
for s in outputs:
  try:
    v = int(s.strip())
    ret.append(v)
  except:
    ret.append(100)
zhengzangw commented 1 week ago

One possible reason why this happens is maybe different lib version can lead to unstability.

ds-ssj commented 1 week ago

One possible reason why this happens is maybe different lib version can lead to unstability.

Thank you for your assistance. I understand that sometimes a legitimate output might be followed by an additional token. May I ask which library would cause some difference in this output behavior?

zhengzangw commented 1 week ago

Not sure. The most likely ones are transformers and torch.