(base) ➜ jupyter git:(master) ✗ ./train.sh
PyTorch version 2.0.0+cpu available.
generated new fontManager
/usr/local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
███████████████████████
█ █ █ █ ▜█ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ ███
█ █ █ █ █ █ █ █ █ ▌ █
█ █████ █ █ █ █ █ █ █ █
█ █ ▟█ █ █ █
███████████████████████
ludwig v0.10.2 - Experiment
Setting generation max_new_tokens to 1024 to correspond with the max sequence length assigned to the output feature or the global max sequence length. This will ensure that the correct number of tokens are generated at inference time. To override this behavior, set `generation.max_new_tokens` to a different value in your Ludwig config.
╒════════════════════════╕
│ EXPERIMENT DESCRIPTION │
╘════════════════════════╛
╒══════════════════╤══════════════════════════════════════════════════════════════════════════╕
│ Experiment name │ experiment │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ Model name │ run │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ Output directory │ /src/results/experiment_run_2 │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ ludwig_version │ '0.10.2' │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ command │ ('/usr/local/bin/ludwig experiment --config /src/config.yaml --dataset ' │
│ │ '/data/train-00000-of-00001.parquet --output_directory /src/results') │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ random_seed │ 42 │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ dataset │ '/data/train-00000-of-00001.parquet' │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ data_format │ 'parquet' │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ torch_version │ '2.0.0+cpu' │
├──────────────────┼──────────────────────────────────────────────────────────────────────────┤
│ compute │ {'num_nodes': 1} │
╘══════════════════╧══════════════════════════════════════════════════════════════════════════╛
╒═══════════════╕
│ LUDWIG CONFIG │
╘═══════════════╛
User-specified config (with upgrades):
{ 'adapter': {'type': 'lora'},
'base_model': 'facebook/opt-1.3b',
'input_features': [{'name': 'prompt', 'type': 'text'}],
'ludwig_version': '0.10.2',
'model_type': 'llm',
'output_features': [{'name': 'response', 'type': 'text'}],
'preprocessing': {'sample_ratio': 0.1},
'prompt': { 'template': '### Instruction:\n'
'{instruction}\n'
'\n'
'### Response:\n'},
'trainer': { 'batch_size': 'auto',
'compile': True,
'epochs': 3,
'gradient_accumulation_steps': 16,
'learning_rate': 'auto',
'learning_rate_scaling': 'sqrt',
'learning_rate_scheduler': {'warmup_fraction': 0.01},
'optimizer': {'type': 'adamw'},
'type': 'finetune',
'use_mixed_precision': True}}
Full config saved to:
/src/results/experiment_run_2/experiment/model/model_hyperparameters.json
╒═══════════════╕
│ PREPROCESSING │
╘═══════════════╛
Found cached dataset and meta.json with the same filename of the dataset, but checksums don't match, if saving of processed input is not skipped they will be overridden
Using full raw dataset, no hdf5 and json file with the same name have been found
Building dataset (it may take a while)
Loaded HuggingFace implementation of facebook/opt-1.3b tokenizer
/usr/local/lib/python3.8/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Max length of feature 'None': 2713 (without start and stop symbols)
Max sequence length is 2713 for feature 'None'
Loaded HuggingFace implementation of facebook/opt-1.3b tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Max length of feature 'response': 3109 (without start and stop symbols)
Max sequence length is 3109 for feature 'response'
Loaded HuggingFace implementation of facebook/opt-1.3b tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Loaded HuggingFace implementation of facebook/opt-1.3b tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Building dataset: DONE
Writing preprocessed training set cache to /data/train-00000-of-00001.training.hdf5
Writing preprocessed validation set cache to /data/train-00000-of-00001.validation.hdf5
Writing preprocessed test set cache to /data/train-00000-of-00001.test.hdf5
Writing train set metadata to /data/train-00000-of-00001.meta.json
Dataset Statistics
╒════════════╤═══════════════╤════════════════════╕
│ Dataset │ Size (Rows) │ Size (In Memory) │
╞════════════╪═══════════════╪════════════════════╡
│ Training │ 4900 │ 1.12 Mb │
├────────────┼───────────────┼────────────────────┤
│ Validation │ 700 │ 164.19 Kb │
├────────────┼───────────────┼────────────────────┤
│ Test │ 1400 │ 328.25 Kb │
╘════════════╧═══════════════╧════════════════════╛
╒═══════╕
│ MODEL │
╘═══════╛
Warnings and other logs:
Loading large language model...
Done.
Loaded HuggingFace implementation of facebook/opt-1.3b tokenizer
==================================================
Trainable Parameter Summary For Fine-Tuning
Fine-tuning with adapter: lora
trainable params: 1,572,864 || all params: 1,317,330,944 || trainable%: 0.11939778740975206
==================================================
/usr/local/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
Training with torchdynamo compiled model
`trainer.use_mixed_precision=True`, but no GPU device found. Setting to `False`
Tuning batch size...
Tuning batch size...
Exploring batch_size=1
[2024-11-24 03:19:41,470] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
[2024-11-24 03:19:41,543] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function debug_wrapper
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2024-11-24 03:19:45,054] torch._dynamo.output_graph: [INFO] Step 2: done compiler function debug_wrapper
[2024-11-24 03:19:45,845] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing generate_merged_ids
[2024-11-24 03:19:45,878] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing remove_left_padding
[2024-11-24 03:19:45,909] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing add_left_padding
[2024-11-24 03:19:45,933] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing add_left_padding (RETURN_VALUE)
[2024-11-24 03:19:45,935] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function debug_wrapper
[2024-11-24 03:19:45,963] torch._inductor.compile_fx: [INFO] Step 3: torchinductor compiling FORWARDS graph 1
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2024-11-24 03:19:47,941] torch._inductor.compile_fx: [INFO] Step 3: torchinductor done compiling FORWARDS graph 1
[2024-11-24 03:19:47,943] torch._dynamo.output_graph: [INFO] Step 2: done compiler function debug_wrapper
[2024-11-24 03:19:47,950] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing create_attention_mask
[2024-11-24 03:19:47,967] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing <graph break in forward>
[2024-11-24 03:19:47,971] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing <graph break in forward>
Successfully loaded model weights from /tmp/tmpvrba4sm0/latest.ckpt.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
out_code = transform_code_object(code, transform)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
transformations(instructions, code_options)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform
tracer.run()
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
super().run()
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
and self.step()
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
getattr(self, inst.opname)(inst)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 342, in wrapper
return inner_fn(self, inst)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 1014, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 474, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/variables/nn_module.py", line 244, in call_function
return tx.inline_user_function_return(
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 510, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 1806, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 1862, in inline_call_
tracer.run()
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
and self.step()
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
getattr(self, inst.opname)(inst)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 1030, in LOAD_ATTR
result = BuiltinVariable(getattr).call_function(
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/variables/builtin.py", line 566, in call_function
result = handler(tx, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/variables/builtin.py", line 930, in call_getattr
return obj.var_getattr(tx, name).add_options(options)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/variables/nn_module.py", line 124, in var_getattr
subobj = inspect.getattr_static(base, name)
File "/usr/local/lib/python3.8/inspect.py", line 1622, in getattr_static
raise AttributeError(attr)
AttributeError: config
from user code:
File "/usr/local/lib/python3.8/site-packages/ludwig/models/llm.py", line 276, in <graph break in forward>
model_outputs = self.model(input_ids=self.model_inputs, attention_mask=self.attention_masks).get(LOGITS)
File "/usr/local/lib/python3.8/site-packages/peft/peft_model.py", line 1111, in forward
if self.base_model.config.model_type == "mpt":
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/ludwig", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/ludwig/cli.py", line 197, in main
CLI()
File "/usr/local/lib/python3.8/site-packages/ludwig/cli.py", line 72, in __init__
getattr(self, args.command)()
File "/usr/local/lib/python3.8/site-packages/ludwig/cli.py", line 97, in experiment
experiment.cli(sys.argv[2:])
File "/usr/local/lib/python3.8/site-packages/ludwig/experiment.py", line 528, in cli
experiment_cli(**vars(args))
File "/usr/local/lib/python3.8/site-packages/ludwig/experiment.py", line 217, in experiment_cli
(eval_stats, train_stats, preprocessed_data, output_directory) = model.experiment(
File "/usr/local/lib/python3.8/site-packages/ludwig/api.py", line 1539, in experiment
(train_stats, preprocessed_data, output_directory) = self.train(
File "/usr/local/lib/python3.8/site-packages/ludwig/api.py", line 655, in train
self._tune_batch_size(trainer, training_set, random_seed=random_seed)
File "/usr/local/lib/python3.8/site-packages/ludwig/api.py", line 883, in _tune_batch_size
tuned_batch_size = trainer.tune_batch_size(
File "/usr/local/lib/python3.8/site-packages/ludwig/trainers/trainer_llm.py", line 493, in tune_batch_size
return super().tune_batch_size(
File "/usr/local/lib/python3.8/site-packages/ludwig/trainers/trainer.py", line 597, in tune_batch_size
best_batch_size = evaluator.select_best_batch_size(
File "/usr/local/lib/python3.8/site-packages/ludwig/utils/batch_size_tuner.py", line 57, in select_best_batch_size
samples_per_sec = self.evaluate(
File "/usr/local/lib/python3.8/site-packages/ludwig/utils/batch_size_tuner.py", line 108, in evaluate
self.step(batch_size, global_max_sequence_length=global_max_sequence_length)
File "/usr/local/lib/python3.8/site-packages/ludwig/utils/batch_size_tuner.py", line 170, in step
self.perform_step(inputs, targets)
File "/usr/local/lib/python3.8/site-packages/ludwig/utils/batch_size_tuner.py", line 180, in perform_step
self.trainer.train_step(inputs, targets)
File "/usr/local/lib/python3.8/site-packages/ludwig/trainers/trainer.py", line 339, in train_step
model_outputs = self.dist_model((inputs, targets))
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward
return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/ludwig/models/llm.py", line 265, in forward
self.model_inputs, self.attention_masks = generate_merged_ids(
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 337, in catch_errors
return callback(frame, cache_size, hooks)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame
result = inner_convert(frame, cache_size, hooks)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 104, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert
return _compile(
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 394, in _compile
raise InternalTorchDynamoError() from e
torch._dynamo.exc.InternalTorchDynamoError