rmihaylov / mpttune

Tune MPTs
Apache License 2.0
84 stars 17 forks source link

Few questions on fine tuning #4

Open sidharthiimc opened 1 year ago

sidharthiimc commented 1 year ago
  1. What is the max context window that a 7B model can take in? I am looking for a business problem with a min of 4K to max of 32K tokens as input.
  2. For above task will it better to fine tune your 4bit GPTQ quantized model or Base model from scratch?
  3. Is single A100 GPU will be enough for the above task?
  4. How long will it take for say 10K sample and 10 epoch?
  5. I want to do predict in a batch. I am seeing since evaluation is happening in batch. Can add or point to the code which can help me use fine tuned model to predict in for a batch say size 8 or 10.