microsoft / vscode-ai-toolkit

MIT License
865 stars 36 forks source link

For generate json's need use CultureInvariant #40

Open Serg2DFX opened 5 months ago

Serg2DFX commented 5 months ago

I create project microsoft/phi-2 and get generated conten (json files - not valid) see scrinshot: image

please use in codegen InvariantCulture

ningx-ms commented 5 months ago

@Serg2DFX thanks for reporting this bug!

For us to repro the issue and test the fix, could you provide more information on the repro steps?

  1. What is your Windows system's locale and region?
  2. What value did you put in the wizard window for those variables?
Serg2DFX commented 5 months ago

For playback, you can use the localization ru-Ru, fr-Fr or any other language with a comma separator.

Sample repro steps: image

Screen in wizards: image in UI (dotted format) image image

and logs (comma format): image

Debug: generate-project[0]
   03:04:44.23 0 ExecuteAsync Started
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<conda_env_name> jsonTokenValue:phi-2-env
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<prompt_template> jsonTokenValue:### Text: {}\n### The tone is:\n
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<data_configs_data_files> jsonTokenValue:dataset/dataset-classification.json
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<data_configs_split> jsonTokenValue:train
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<dataset_type> jsonTokenValue:corpus
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<text_cols> jsonTokenValue:[
  "phrase",
  "tone"
]
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<text_template> jsonTokenValue:### Text: {phrase}\n### The tone is:\n{tone}
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<line_by_line> jsonTokenValue:join
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<source_max_len> jsonTokenValue:1024
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<pad_to_max_len> jsonTokenValue:false
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<compute_dtype> jsonTokenValue:bfloat16
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<quant_type> jsonTokenValue:nf4
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<double_quant> jsonTokenValue:true
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<lora_r> jsonTokenValue:64
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<lora_alpha> jsonTokenValue:64
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<lora_dropout> jsonTokenValue:0,1
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<eval_dataset_size> jsonTokenValue:0,3
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<training_args_seed> jsonTokenValue:0
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<training_args_data_seed> jsonTokenValue:42
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<per_device_train_batch_size> jsonTokenValue:1
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<per_device_eval_batch_size> jsonTokenValue:1
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<gradient_accumulation_steps> jsonTokenValue:4
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<gradient_checkpointing> jsonTokenValue:false
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<learning_rate> jsonTokenValue:0,0001
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<num_train_epochs> jsonTokenValue:3
Trace: generate-project[0]
   03:04:44.24 0 ReplaceToken:<max_steps> jsonTokenValue:1200
Trace: generate-project[0]

you can check the formatting by analogy with: https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_tolocalestring_num_all