the-crypt-keeper / can-ai-code

Self-evaluating interview for AI coders
https://huggingface.co/spaces/mike-ravkine/can-ai-code-results
MIT License
491 stars 27 forks source link

Eval for ajibawa-2023/OpenHermes-2.5-Code-290k-13B #169

Closed ajinkya123-robo closed 4 months ago

ajinkya123-robo commented 4 months ago

Hello @the-crypt-keeper , Kindly evaluate my model as time permits. OpenHermes-2.5-Code-290k-13B is a state of the art Llama-2 Fine-tune, which is trained on additional code dataset. This model is trained on my existing dataset OpenHermes-2.5-Code-290k. This dataset is amalgamation of two datasets. I have used OpenHermes-2.5 a super quality dataset made available by teknium. Other dataset is my own Code-290k-ShareGPT. Dataset is in Vicuna/ShareGPT format. There are around 1.29 million set of conversations. I have cleaned the dataset provided by Teknium and removed metadata such as "source" & "category" etc. This dataset has primarily synthetically generated instruction and chat samples.

This model has enhanced coding capabilities besides other capabilities such as Blogging, story generation, Q&A and many more.

https://huggingface.co/ajibawa-2023/OpenHermes-2.5-Code-290k-13B

I have used ShareGPT/Vicuna format.

Thanks

the-crypt-keeper commented 4 months ago

@ajinkya123-robo nearly perfect score on junior-v2 as usual but again not much luck on senior. Vicuna-1p3-v2a was best performing prompt format at ~25%.

ajinkya123-robo commented 4 months ago

Thank you very much @the-crypt-keeper . I guess Llama is the problem for senior results, as this model is fantastic in other departments as well.