Closed l3utterfly closed 1 week ago
When you want to set any yaml config field to None, you have to use null
instead of None
(like in JavaScript). Otherwise it will be interpreted as an object named None
. For example, the seed
field in your config is set to null
instead of None
.
Alternatively, you can just remove the chat_format field entirely and it will default to None correctly.
@RdoubleA thank you for your reponse!
I tried removing it, I get this error: TypeError: chat_dataset() missing 1 required positional argument: 'chat_format'
. Seems it is required?
I also tried setting it to "null", I got this error:
File "/home/layla/src/Layla-datasets/.venv/lib/python3.10/site-packages/torchtune/config/_utils.py", line 223, in _get_chat_format
return _try_get_component("torchtune.data._chat_formats", chat_format, "ChatFormat")
File "/home/layla/src/Layla-datasets/.venv/lib/python3.10/site-packages/torchtune/config/_utils.py", line 193, in _try_get_component
return _get_component_from_path(module_path + "." + component_name)
TypeError: can only concatenate str (not "NoneType") to str
@l3utterfly How did you install torchtune, via pip? I would install the nightlies or from source, we've made chat format optional for llama3 and it's not available in stable yet :)
@RdoubleA yes, I installed from pip. Thanks, I will try the nightly now
Thank you, installing directly from source worked
My dataset config:
From the tutorial here: https://pytorch.org/torchtune/main/tutorials/chat.html I see Llama3 requires chat format = None since the Tokenizer takes care of the special tokens
However, running:
tune run --nproc_per_node 6 full_finetune_distributed --config ./llama3.yaml
I get this error:
Are there any example configs for using a custom dataset for finetuning llama3?