Few questions on fine tuning

What is the max context window that a 7B model can take in? I am looking for a business problem with a min of 4K to max of 32K tokens as input.
For above task will it better to fine tune your 4bit GPTQ quantized model or Base model from scratch?
Is single A100 GPU will be enough for the above task?
How long will it take for say 10K sample and 10 epoch?
I want to do predict in a batch. I am seeing since evaluation is happening in batch. Can add or point to the code which can help me use fine tuned model to predict in for a batch say size 8 or 10.

rmihaylov / mpttune