openai / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
https://openai.com/blog/better-language-models/
Other
22.57k stars 5.53k forks source link

Enhancing GPT-2: AI-Driven Visualizations, Code Optimization, and Parameter Refinement” we seek to build on the foundation of GPT-2. #350

Open RahulVadisetty91 opened 2 months ago

RahulVadisetty91 commented 2 months ago

1. Summary:

This Pull Request incorporates the following changes: adding AI visualization, improving the code by optimizing it and eliminating unused parameters in the GPT-2 script. These are, model architecture visualizations, performance metrics, softmax optimization, attention mechanisms, normalized and adaptive learning rate methodologies. The code clean-up has also been done to incorporate unused parameters such as hparams and other unrequired operations which enhances code readability and its performance.

2. Related Issues:

The changes introduced here are aimed at solving problems connected with unused parameters which complicated the code, non-optimal handling of large input sequences, and unstable softmaxes. The above code was analyzed through SonarLint and the tool mentioned that there were some unused function parameters which have been omitted to make the code more efficient. Moreover, to improve the interpretability of the model, attention heads visualization and layer wise analysis was discussed.

3. Discussions:

The major topics of the discussions included how to enhance GPT-2’s model explainability using AI visualization techniques and how to enhance the code that underpins the model. Issues discussed were the value of visualizing the layers of models and attention in models, tuning softmax to avoid overflow, and improving the adaptive learning rate to improve the training process.

4. QA Instructions: