mistralai / mistral-finetune

Apache License 2.0
2.45k stars 164 forks source link

data validation issue #66

Closed noviljohnson closed 3 weeks ago

noviljohnson commented 4 weeks ago

hi, i updated the yaml file as show below.

# data
data:
  instruct_data: "E:/mistral-finetune/dataset/electrical_eng_train.jsonl"  # Fill
  eval_instruct_data: "E:/mistral-finetune/dataset/electrical_eng_test.jsonl"  # Optionally fill

# model
model_id_or_path: "E:/mistral-finetune/mistral_models/7B-v0.3"  # Change to downloaded path

run_dir: "E:/mistral-finetune"  # Fill

wandb:
  project: "mistral_finetune" # your wandb project name
  run_name: "finetune1" # your wandb run name
  key: "******************************************" # your wandb api key
  offline: False

later i ran the following command

python -m utils.validate_data --train_yaml example/7B.yaml

i got the following error

Traceback (most recent call last):
  File "C:\Users\Novilsaikumar.A\.conda\envs\mstrlFT\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Novilsaikumar.A\.conda\envs\mstrlFT\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "E:\mistral-finetune\utils\validate_data.py", line 366, in <module>
    main(args)
  File "E:\mistral-finetune\utils\validate_data.py", line 173, in main
    datasets, weights = parse_data_sources(pretrain_file, instruct_file)
  File "E:\mistral-finetune\finetune\data\dataset.py", line 128, in parse_data_sources
    weight = float(weight_)
ValueError: could not convert string to float: '/mistral-finetune/dataset/electrical_eng_train.jsonl'

any idea how to solve.

thankx

noviljohnson commented 3 weeks ago

"