[rank0]: RemoteTraceback:
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/lib/python3/dist-packages/multiprocess/pool.py", line 125, in
[rank0]: worker
[rank0]: result = (True, func(*args, **kwds))
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3/dist-packages/datasets/utils/py_utils.py", line 678, in
[rank0]: _write_generator_to_queue
[rank0]: for i, result in enumerate(func(**kwargs)):
[rank0]: File "/usr/lib/python3/dist-packages/datasets/arrow_dataset.py", line 3517, in
[rank0]: _map_single
[rank0]: example = apply_function_on_filtered_inputs(example, i, offset=offset)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3/dist-packages/datasets/arrow_dataset.py", line 3416, in
[rank0]: apply_function_on_filtered_inputs
[rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/llm-foundry/llmfoundry/data/finetuning/tasks.py", line 889, in
[rank0]: dataset_mapper
[rank0]: return tokenize_formatted_example(example, tokenizer)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/llm-foundry/llmfoundry/data/finetuning/tasks.py", line 408, in
[rank0]: tokenize_formatted_example
[rank0]: example_format = _get_example_type(example)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/llm-foundry/llmfoundry/data/finetuning/tasks.py", line 150, in
[rank0]: _get_example_type
[rank0]: raise UnknownExampleTypeError(str(example.keys()))
[rank0]: llmfoundry.utils.exceptions.UnknownExampleTypeError: "Found keys
[rank0]: KeysView({'prompt': 'hello, ', 'response': 'world!', 'random_extra_key': 'sup'})
[rank0]: in dataset. Unknown example type. For prompt and response finetuning, the valid
[rank0]: prompt keys are {'prompt'} and the valid response keys are {'completion',
[rank0]: 'response'}. For chat finetuning, the allowed keys are {'messages'}"
[rank0]: """
which is what we want
We have been getting this error:
Traceback (most recent call last):
File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/usr/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3/dist-packages/multiprocess/pool.py", line 579, in _handle_results
task = get()
^^^^^
File "/usr/lib/python3/dist-packages/multiprocess/connection.py", line 254, in recv
return _ForkingPickler.loads(buf.getbuffer())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dill/_dill.py", line 303, in loads
return load(file, ignore, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dill/_dill.py", line 289, in load
return Unpickler(file, ignore=ignore, **kwds).load()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dill/_dill.py", line 444, in load
obj = StockUnpickler.load(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/llmfoundry/utils/exceptions.py", line 85, in __init__
f'Found keys {example.keys()} in dataset. Unknown example type. For prompt and response '
^^^^^^^^^^^^
This PR fixes this by checking if example is a string before calling keys().
Manual Tests:
ift-mpt-7b-lrhex4-hsukuh
Fails with
which is what we want
We have been getting this error:
This PR fixes this by checking if
example
is a string before callingkeys()
.