Closed pmhalvor closed 2 years ago
job 45447 is a study on weights, testing the new attention-optimizer impl. When this study finishes, new attentions for every stack layer can be implemented and studied, thus concluding RACL development.