Audience For This Repo - Githubissues

hamelsmu commented 7 months ago

Carrying over discussion with @mwaskom from this thread

I think this repo is pretty difficult to reason about if you aren't familiar with axolotl IMO. Like what are these configs? How does it work? How are my prompts assembled exactly? What does the dataset format need to be? Are there other dataset formats? How do I check the prompt construction? etc. I was actually assuming that the user is indeed familiar with axolotl.
If you are very familiar with axoltol, this --data flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.

cc: @charlesfrye @winglian curious what you think

_Originally posted by @hamelsmu in https://github.com/modal-labs/llm-finetuning/pull/48#discussion_r1575225245_

JUNIORCO commented 7 months ago

To add to this:

Would be great to add a Llama 3 example config. Here's mine

See config

```yaml base_model: meta-llama/Meta-Llama-3-8B model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: true strict: false # note I have my own dataset here that isn't part of the examples datasets: - path: train.jsonl type: sharegpt dataset_prepared_path: val_set_size: 0 output_dir: ./out/qlora-llama3-70b adapter: qlora lora_model_dir: sequence_len: 512 sample_packing: false pad_to_sequence_len: true lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out: wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 4 micro_batch_size: 1 num_epochs: 2 optimizer: adamw_torch lr_scheduler: cosine learning_rate: 0.00001 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 10 evals_per_epoch: 4 eval_table_size: saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 special_tokens: eos_token: "<|im_end|>" pad_token: "<|end_of_text|>" tokens: - "<|im_start|>" ```

A conversational dataset example would be nice for folks coming from the OpenAI fine-tuning world. They force a dataset of this format. What was a bit confusing is that there's no Axolotl dataset format that matches the OpenAI format, so I had to modify my dataset slightly to fit the sharegpt type
A bit more effort into the inference. Make it a POST request that exposes an OpenAI compatible endpoint like this. This is what a lot of folks are interested in doing imo

Happy to make a PR

shamikbose commented 5 months ago

If you are very familiar with axoltol, this --data flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.

@hamelsmu Even as a newcomer to axolotl, the discrepancy between the data flags in the two frameworks is really confusing to me. It would be helpful to have a guide describing the difference between how the flags are being used by the different frameworks (as a start)

devanshrj commented 5 months ago

Agree with @JUNIORCO. It would be great to have a conversational dataset example that works with a model like Llama3-8B-Instruct. I made a few attempts based on axolotl's example config and the example configs provided in this repo, but none seem to work with Llama3-8B-Instruct's format.

Additionally, it would also be great to have more details about the docker container and axolotl version used by Modal.

modal-labs / llm-finetuning

Audience For This Repo #51