salesforce / jaxformer

Minimal library to train LLMs on TPU in JAX with pjit().
BSD 3-Clause "New" or "Revised" License
270 stars 35 forks source link

[Suggestion]: Code Notes #14

Open Librechain opened 1 year ago

Librechain commented 1 year ago

Hello! Want to preface this 'issue' by sincerely thanking the owners of this repo & those that were responsible in creating the codegen model for taking the time to publish about your process, open source the model & create accompanying repos + publishing the models on HuggingFace. Your contributions to the community are invaluable.

I wanted to inquire about potentially adding code notes to some (or all) of the .py files included in this repo. While I know this is a big ask, I was curious about whether this might be a good place to put your fine tuned codegen_6b_mono model to work since it was trained on a large corpus of Python code, specifically. While program synthesis via NLP is a remarkable breakthrough, I also believe increasing the literacy of code for general observers is equally important.

Given all you guys have done in creating the relevant models, fine tuning & benchmarking them, documenting the process, publishing your findings & results as well as publishing the relevant code publicly, it would be borderline rapacious to demand the deployed code be annotated. Thus, I'm wondering if this could be considered an additional "real-world" use case for the fine tuned model you all created.

Please let me know what you guys think. I believe such a task is in line with this project's core ethos, which seems to be lowering the barriers to entry for programming or development, whether that be through an expedited workflow for experienced programmers or providing a 'bridge' for those that have a strong semantic understanding of programming / programming task, but lack the technical knowledge to iterate the necessary code from scratch. In specific, the latter scenario represents a democratization of involvement in the coding and programming process.

In that same vein, I believe potentially leveraging this model to append / annotate published code will promote a more comprehensive understanding among all those that interact with it - which can ultimately lead to reduced mistakes, errors, and misconfigurations all while also saving developer's time in answering questions or clarifying certain misunderstandings that may otherwise be made clear through this effort.

22Mukesh22 commented 1 year ago

Hey @Librechain Its kind of what chatGPT is able to do , its really a good suggestion of having some briefing of the code . If this can be achieved then its great.

glicerico commented 1 year ago

Great idea, despite all the effort done by the team, it's still hard to get to fine-tune one of their models, mostly because the code is not annotated, and README instructions are cryptic