h2oai / h2o-llmstudio

H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://docs.h2o.ai/h2o-llmstudio/
https://h2o.ai
Apache License 2.0
4.02k stars 416 forks source link

[BUG] Happy Path is failing #258

Closed pligor closed 1 year ago

pligor commented 1 year ago

🐛 Bug

image

Use current July 6th 2023 "main" branch or checking out the git tag v0.0.4 both yield the same above error.

Note that openai api key has been set and saved in settings, all the rest are the default ones. Need appropriate error messages.

Also note that we are using the default training dataset.

Could you help ?

REPORT:

q.app
script_sources: ['/_f/b18ecc74-848e-40a0-a50b-487bebb5f5c9/tmpb6h22j8u.min.js']
initialized: True
wave_utils_stack_trace_str: ### stacktrace
Traceback (most recent call last):

  File "/home/chester/Downloads/h2o-llmstudio/./app_utils/handlers.py", line 190, in handle
    await experiment_start(q)

  File "/home/chester/Downloads/h2o-llmstudio/./app_utils/sections/experiment.py", line 450, in experiment_start
    option_items = get_ui_elements(cfg=q.client["experiment/start/cfg"], q=q)

  File "/home/chester/Downloads/h2o-llmstudio/./app_utils/utils.py", line 1070, in get_ui_elements
    elements_group = get_ui_elements(cfg=v, q=q, limit=limit, pre=pre)

  File "/home/chester/Downloads/h2o-llmstudio/./app_utils/utils.py", line 1010, in get_ui_elements
    poss_values, v = cfg._get_possible_values(

  File "/home/chester/Downloads/h2o-llmstudio/./llm_studio/python_configs/base.py", line 81, in _get_possible_values
    dataset, value = dataset_fn(field, value)

  File "/home/chester/Downloads/h2o-llmstudio/./app_utils/utils.py", line 612, in get_dataset
    dataset_cfg = load_config_yaml(dataset["config_file"]).dataset.__dict__

  File "/home/chester/Downloads/h2o-llmstudio/./llm_studio/src/utils/config_utils.py", line 208, in load_config_yaml
    with open(path, "r") as fp:

FileNotFoundError: [Errno 2] No such file or directory: 'data/user/oasst/text_causal_language_modeling_config.yaml'

q.user
q.client
app_db: <app_utils.db.Database object at 0x7f0af919a290>
client_initialized: True
mode_curr: error
theme_dark: True
default_aws_bucket_name: bucket_name
default_kaggle_username: 
set_max_epochs: 50
set_max_batch_size: 256
set_max_gradient_clip: 10
set_max_lora_r: 256
set_max_lora_alpha: 256
gpu_used_for_chat: 1
default_number_of_workers: 4
default_logger: None
default_neptune_project: 
default_openai_azure: False
default_openai_api_base: https://example-endpoint.openai.azure.com
default_openai_api_deployment_id: deployment-name
default_openai_api_version: 2023-05-15
default_gpt_eval_max: 100
delete_dialogs: True
chart_plot_max_points: 1000
init_interface: True
notification_bar: None
nav/active: experiment/start
experiment/list/mode: train
dataset/list/df_datasets:    id          name  ... validation rows  labels
2   3  train_full.1  ...            None  output

[1 rows x 10 columns]
experiment/list/df_experiments: Empty DataFrame
Columns: [id, name, mode, dataset, config_file, path, seed, process_id, gpu_list, status, info]
Index: []
expander: True
dataset/list: False
dataset/list/table: []
experiment/list: False
experiment/list/table: []
dataset/import: False
dataset/import/source: Upload
dataset/import/id: None
dataset/import/cfg_file: text_causal_language_modeling_config
dataset/list/delete: False
dataset/import/local_upload: ['/_f/5cf4714b-86d7-45df-a16b-780d4dbe0c7f/train_full.csv']
dataset/import/local_path: /home/chester/Downloads/h2o-llmstudio/data/user/train_full.csv
dataset/import/path: data/user/tmp
dataset/import/name: train_full.1
dataset/import/edit: False
dataset/import/cfg_category: text
dataset/import/cfg: ConfigProblemBase(output_directory='output/text_causal_language_modeling_config', experiment_name='impartial-millipede', _parent_experiment='', llm_backbone='EleutherAI/pythia-2.8b-deduped', dataset=ConfigNLPCausalLMDataset(dataset_class=<class 'llm_studio.src.datasets.text_causal_language_modeling_ds.CustomDataset'>, personalize=False, chatbot_name='h2oGPT', chatbot_author='H2O.ai', train_dataframe='data/user/train_full.1/train_full.csv', validation_strategy='automatic', validation_dataframe='None', validation_size=0.01, data_sample=1.0, data_sample_choice=('Train', 'Validation'), prompt_column=('instruction',), answer_column='output', parent_id_column='None', text_prompt_start='<|prompt|>', text_answer_separator='<|answer|>', limit_chained_samples=False, add_eos_token_to_prompt=True, add_eos_token_to_answer=True, mask_prompt_labels=True, _allowed_file_extensions=('csv', 'pq')), tokenizer=ConfigNLPCausalLMTokenizer(max_length_prompt=256, max_length_answer=256, max_length=512, add_prompt_answer_tokens=False, padding_quantile=1.0, use_fast=True, add_prefix_space=False), architecture=ConfigNLPCausalLMArchitecture(model_class=<class 'llm_studio.src.models.text_causal_language_modeling_model.Model'>, reward_model_class=<class 'llm_studio.src.models.text_reward_model.RewardModel'>, pretrained=True, backbone_dtype='float16', gradient_checkpointing=True, force_embedding_gradients=False, intermediate_dropout=0, pretrained_weights=''), training=ConfigNLPCausalLMTraining(loss_class=<class 'llm_studio.src.losses.text_causal_language_modeling_losses.Losses'>, loss_function='TokenAveragedCrossEntropy', optimizer='AdamW', learning_rate=0.0001, differential_learning_rate_layers=(), differential_learning_rate=1e-05, batch_size=2, drop_last_batch=True, epochs=1, schedule='Cosine', warmup_epochs=0.0, weight_decay=0.0, gradient_clip=0.0, grad_accumulation=1, lora=True, lora_r=4, lora_alpha=16, lora_dropout=0.05, lora_target_modules='', save_best_checkpoint=False, evaluation_epochs=1.0, evaluate_before_training=False, train_validation_data=False, use_rlhf=False, reward_model='OpenAssistant/reward-model-deberta-v3-large-v2', adaptive_kl_control=True, initial_kl_coefficient=0.2, kl_target=6.0, kl_horizon=10000, advantages_gamma=0.99, advantages_lambda=0.95, ppo_clip_policy=0.2, ppo_clip_value=0.2, scaling_factor_value_loss=0.1, ppo_epochs=4, ppo_batch_size=1, ppo_generate_temperature=1.0, offload_reward_model=False), augmentation=ConfigNLPAugmentation(nlp_augmentations_class=<class 'llm_studio.src.augmentations.nlp_aug.BaseNLPAug'>, token_mask_probability=0, skip_parent_probability=0, random_parent_probability=0), prediction=ConfigNLPCausalLMPrediction(metric_class=<class 'llm_studio.src.metrics.text_causal_language_modeling_metrics.Metrics'>, metric='GPT', metric_gpt_model='gpt-3.5-turbo-0301', min_length_inference=2, max_length_inference=256, batch_size_inference=0, do_sample=False, num_beams=1, temperature=0.3, repetition_penalty=1.2, stop_tokens='', top_k=0, top_p=1.0, num_history=4), environment=ConfigNLPCausalLMEnvironment(gpus=('0',), mixed_precision=True, compile_model=False, use_fsdp=False, find_unused_parameters=False, trust_remote_code=True, huggingface_branch='main', number_of_workers=4, seed=-1, _seed=0, _distributed=False, _distributed_inference=True, _local_rank=0, _world_size=1, _curr_step=0, _curr_val_step=0, _rank=0, _device='cuda', _cpu_comm=None), logging=ConfigNLPCausalLMLogging(logger='None', neptune_project='', _neptune_debug=False, plots_class=<class 'llm_studio.src.plots.text_causal_language_modeling_plots.Plots'>, number_of_texts=10, _logger=None))
dataset/import/cfg/dataframe:                                             instruction  ...                             parent_id
0     Can you write a short introduction about the r...  ...                                   NaN
1     What can be done at a regulatory level to ensu...  ...  636dd191-50df-4894-ba9a-cd7f00767258
2     Can you explain contrastive learning in machin...  ...                                   NaN
3     I didn't understand how pulling and pushing wo...  ...  e8ca4e06-a584-4001-8594-5f633e06fa91
4     I want to start doing astrophotography as a ho...  ...                                   NaN
...                                                 ...  ...                                   ...
8269  I just wanted to see how you responded to uncl...  ...  cdbdb1a7-c09c-4c68-97ff-804f96c62e6e
8270  Are you saying a kilogram of feathers is unlik...  ...  0e5846d0-c978-417f-b15a-5d98b18d1dbf
8271  I've recently started playing the turn-based s...  ...                                   NaN
8272  Is into the breach a game with perfect informa...  ...  c7e3cdc6-62a1-4616-9e96-0304ca394ca4
8273  Does this mean that Into the Breach is a game ...  ...  da29e067-280f-4914-b837-d9685448d9e5

[8274 rows x 4 columns]
dataset/import/cfg/train_dataframe: data/user/tmp/train_full.csv
dataset/import/cfg/validation_dataframe: None
dataset/import/cfg/prompt_column: ['instruction']
dataset/import/cfg/answer_column: output
dataset/import/cfg/parent_id_column: None
dataset/import/4: True
dataset/merge: False
dataset/import/6: True
dataset/import/3/edit: False
dataset/newexperiment: False
dataset/edit: False
dataset/delete/dialog/single: False
experiment/start: True
experiment/start/cfg_category: text
experiment/start/cfg_file: text_causal_language_modeling_config
experiment/start/cfg_experiment_prev: None
experiment/start/cfg_file_prev: text_causal_language_modeling_config
experiment/start/prev_dataset: None
experiment/start/cfg_sub: 
experiment/start/dataset: 1
experiment/start/cfg_mode/mode: train
experiment/start/cfg_mode/from_dataset: True
experiment/start/cfg_mode/from_cfg: True
experiment/start/cfg_mode/from_default: True
experiment/start/cfg_mode/from_dataset_args: False
experiment/start/cfg: ConfigProblemBase(output_directory='output/text_causal_language_modeling_config', experiment_name='quiet-platypus', _parent_experiment='', llm_backbone='EleutherAI/pythia-2.8b-deduped', dataset=ConfigNLPCausalLMDataset(dataset_class=<class 'llm_studio.src.datasets.text_causal_language_modeling_ds.CustomDataset'>, personalize=False, chatbot_name='h2oGPT', chatbot_author='H2O.ai', train_dataframe='/path/to/train.csv', validation_strategy='automatic', validation_dataframe='', validation_size=0.01, data_sample=1.0, data_sample_choice=('Train', 'Validation'), prompt_column=('instruction', 'input'), answer_column='output', parent_id_column='None', text_prompt_start='<|prompt|>', text_answer_separator='<|answer|>', limit_chained_samples=False, add_eos_token_to_prompt=True, add_eos_token_to_answer=True, mask_prompt_labels=True, _allowed_file_extensions=('csv', 'pq')), tokenizer=ConfigNLPCausalLMTokenizer(max_length_prompt=256, max_length_answer=256, max_length=512, add_prompt_answer_tokens=False, padding_quantile=1.0, use_fast=True, add_prefix_space=False), architecture=ConfigNLPCausalLMArchitecture(model_class=<class 'llm_studio.src.models.text_causal_language_modeling_model.Model'>, reward_model_class=<class 'llm_studio.src.models.text_reward_model.RewardModel'>, pretrained=True, backbone_dtype='float16', gradient_checkpointing=True, force_embedding_gradients=False, intermediate_dropout=0, pretrained_weights=''), training=ConfigNLPCausalLMTraining(loss_class=<class 'llm_studio.src.losses.text_causal_language_modeling_losses.Losses'>, loss_function='TokenAveragedCrossEntropy', optimizer='AdamW', learning_rate=0.0001, differential_learning_rate_layers=(), differential_learning_rate=1e-05, batch_size=2, drop_last_batch=True, epochs=1, schedule='Cosine', warmup_epochs=0.0, weight_decay=0.0, gradient_clip=0.0, grad_accumulation=1, lora=True, lora_r=4, lora_alpha=16, lora_dropout=0.05, lora_target_modules='', save_best_checkpoint=False, evaluation_epochs=1.0, evaluate_before_training=False, train_validation_data=False, use_rlhf=False, reward_model='OpenAssistant/reward-model-deberta-v3-large-v2', adaptive_kl_control=True, initial_kl_coefficient=0.2, kl_target=6.0, kl_horizon=10000, advantages_gamma=0.99, advantages_lambda=0.95, ppo_clip_policy=0.2, ppo_clip_value=0.2, scaling_factor_value_loss=0.1, ppo_epochs=4, ppo_batch_size=1, ppo_generate_temperature=1.0, offload_reward_model=False), augmentation=ConfigNLPAugmentation(nlp_augmentations_class=<class 'llm_studio.src.augmentations.nlp_aug.BaseNLPAug'>, token_mask_probability=0, skip_parent_probability=0, random_parent_probability=0), prediction=ConfigNLPCausalLMPrediction(metric_class=<class 'llm_studio.src.metrics.text_causal_language_modeling_metrics.Metrics'>, metric='GPT', metric_gpt_model='gpt-3.5-turbo-0301', min_length_inference=2, max_length_inference=256, batch_size_inference=0, do_sample=False, num_beams=1, temperature=0.3, repetition_penalty=1.2, stop_tokens='', top_k=0, top_p=1.0, num_history=4), environment=ConfigNLPCausalLMEnvironment(gpus=('0',), mixed_precision=True, compile_model=False, use_fsdp=False, find_unused_parameters=False, trust_remote_code=True, huggingface_branch='main', number_of_workers=4, seed=-1, _seed=0, _distributed=False, _distributed_inference=True, _local_rank=0, _world_size=1, _curr_step=0, _curr_val_step=0, _rank=0, _device='cuda', _cpu_comm=None), logging=ConfigNLPCausalLMLogging(logger='None', neptune_project='', _neptune_debug=False, plots_class=<class 'llm_studio.src.plots.text_causal_language_modeling_plots.Plots'>, number_of_texts=10, _logger=None))
experiment/start/dataset_prev: 1
experiment/start/cfg/output_directory: output/text_causal_language_modeling_config
experiment/start/cfg/experiment_name: quiet-platypus
experiment/start/trigger_ks: ['train_dataframe', 'validation_strategy', 'data_sample', 'parent_id_column', 'personalize']
experiment/start/cfg/_parent_experiment: 
experiment/start/cfg/llm_backbone: EleutherAI/pythia-2.8b-deduped
experiment/start/cfg/dataset: ConfigNLPCausalLMDataset(dataset_class=<class 'llm_studio.src.datasets.text_causal_language_modeling_ds.CustomDataset'>, personalize=False, chatbot_name='h2oGPT', chatbot_author='H2O.ai', train_dataframe='/path/to/train.csv', validation_strategy='automatic', validation_dataframe='', validation_size=0.01, data_sample=1.0, data_sample_choice=('Train', 'Validation'), prompt_column=('instruction', 'input'), answer_column='output', parent_id_column='None', text_prompt_start='<|prompt|>', text_answer_separator='<|answer|>', limit_chained_samples=False, add_eos_token_to_prompt=True, add_eos_token_to_answer=True, mask_prompt_labels=True, _allowed_file_extensions=('csv', 'pq'))
experiment/start/cfg/dataset_class: <class 'llm_studio.src.datasets.text_causal_language_modeling_ds.CustomDataset'>
experiment/start/cfg/personalize: False
experiment/start/cfg/chatbot_name: h2oGPT
experiment/start/cfg/chatbot_author: H2O.ai
home: False
report_error: True
q.events
q.args
report_error: True
stacktrace
Traceback (most recent call last):

File “/home/chester/Downloads/h2o-llmstudio/./app_utils/handlers.py”, line 190, in handle await experiment_start(q)

File “/home/chester/Downloads/h2o-llmstudio/./app_utils/sections/experiment.py”, line 450, in experiment_start option_items = get_ui_elements(cfg=q.client[“experiment/start/cfg”], q=q)

File “/home/chester/Downloads/h2o-llmstudio/./app_utils/utils.py”, line 1070, in get_ui_elements elements_group = get_ui_elements(cfg=v, q=q, limit=limit, pre=pre)

File “/home/chester/Downloads/h2o-llmstudio/./app_utils/utils.py”, line 1010, in get_ui_elements poss_values, v = cfg._get_possible_values(

File “/home/chester/Downloads/h2o-llmstudio/./llm_studio/python_configs/base.py”, line 81, in _get_possible_values dataset, value = dataset_fn(field, value)

File “/home/chester/Downloads/h2o-llmstudio/./app_utils/utils.py”, line 612, in get_dataset dataset_cfg = load_config_yaml(dataset[“config_file”]).dataset.dict

File “/home/chester/Downloads/h2o-llmstudio/./llm_studio/src/utils/config_utils.py”, line 208, in load_config_yaml with open(path, “r”) as fp:

FileNotFoundError: [Errno 2] No such file or directory: ‘data/user/oasst/text_causal_language_modeling_config.yaml’

Error
None
pascal-pfeiffer commented 1 year ago

Hi @pligor thank you for reporting this issue. Could you explain in detail the steps that you took and which machine you are using (hardware and OS).

I just tried current main with a fresh installation

make setup
make wave

and successfully ran an experiment with default parameters on a 3090 and Ubuntu 20.04.

This file path looks odd: ‘data/user/oasst/text_causal_language_modeling_config.yaml’ should only be something like output/user/EXPERIMENT-NAME/cfg.yaml

pligor commented 1 year ago

Hello @pascal-pfeiffer . Many thanks for the swift reply.

The machine is a desktop PC with Ubuntu 22.04

I avoided the make setup because I was not sure if it would generate an environment where CUDA would be enabled. And using GPU is essential for faster processing. However it seems to be ok. I repeated on version v0.0.4 of H2O Studio to do make setup and the torch.cuda.is_available() to returned true. I then did make wave and did not work with same error

I see that @maxjeblick has a fix on that. So which version should I use to include this fix ? Is this merged in main by now ?

maxjeblick commented 1 year ago

The issue should be fixed in main now. Please come back if you encounter any subsequent issues.

pligor commented 1 year ago

It's ok @maxjeblick as a QA engineer I fall onto these things more often than usually :) This fixed it. Many thanks. I'll give it a try