lucidrains / toolformer-pytorch

Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI
MIT License
1.94k stars 124 forks source link

Add GPTJ, Calendar, Wikipedia Search, Machine Translation, Calculator #2

Closed conceptofmind closed 1 year ago

conceptofmind commented 1 year ago

Hi Phil,

In this PR I have added a GPTJ architecture, prompts, and tooling for the Calculator, Wikipedia Search, Machine Translation System, Calendar, and other optional tools such as Wolfram Alpha, Google, and Bing Search.

The Calculator can perform simple numeric calculations supporting the four basic arithmetic operations.

The Wikipedia Search tool uses the ColBERTv2 retrieval model and a Wikipedia database hosted by Stanford. Given a search term, returns short text snippets from Wikipedia.

The Machine Translation System uses NLLB-600M to translate a phrase from any language into English.

The Calendar returns the current date without taking any input.

Optional:

The Wolfram Alpha Calculator utilizes the Wolfram Alpha API to solve math-related questions. This would allow for more complex numeric calculations.

Google and Bing Search which when given a query return relevant titles, snippets, and links from the search results.

I am working on adding Atlas and possibly Google Calendar.

I hope some of this is helpful. Let me know if you want anything changed or removed.

Thank you,

Enrico

conceptofmind commented 1 year ago

I added a GPTJ architecture following: https://github.com/kingoflolz/mesh-transformer-jax/blob/master/mesh_transformer/layers.py

I was unsure if you preferred for the ParallelTransformerBlock in the PaLM repository to be used.

conceptofmind commented 1 year ago

I had initially put together a version of GPTJ similar to that on Huggingface but decided to change it to include ParallelTransformerBlock to be closer to the original Jax version. I tried to keep it similar to the original including: RMSNorm, RoPE, parallel attention and feedforward with residual.

conceptofmind commented 1 year ago

Added a Calendar tool that returns the current date without taking any input.

Output:

Today is Thursday, February 16, 2023.
conceptofmind commented 1 year ago

Separated Calculator and Wolfram Alpha Calculator. Cleaned up documentation. Added prompts from the appendix.

lucidrains commented 1 year ago

@conceptofmind hey Enrico! thanks for getting the ball rolling! i'll be working on this this week :smile:

lucidrains commented 1 year ago

@conceptofmind i'm a bit out of the loop, but do you know if the Meta folks are planning on open sourcing this? has anyone sent the authors an email asking? they seem to be all about that recently, with Llama and all

conceptofmind commented 1 year ago

@conceptofmind i'm a bit out of the loop, but do you know if the Meta folks are planning on open sourcing this?

From my understanding there were no plans to release Toolformer. They had gated the recent Llama release to a few academic researchers (non-commercial license on weights). I can ask Louis in our meeting on Thursday as he is in contact with the FAIR team and see if I can find out more information. I will get back to you on that soon.

Side note: Dakota from EleutherAI has also been working with me to set up sampling and filtering for the data generation and api calls. The initial proof of concept with GPT-J has worked well so far and we should have a labeled api dataset up on Huggingface with some of the tools soon. I can open up a PR or issue with more information to those scripts.

We will work around whatever you want to do.

Thank you again,

Enrico

lucidrains commented 1 year ago

@conceptofmind ah got it, thanks Enrico, for the catch up as well as for the pull request! 🙏

conceptofmind commented 1 year ago

@conceptofmind ah got it, thanks Enrico, for the catch up as well as for the pull request! 🙏

I hope to be able to help further soon. Will open an issue tomorrow with the other details.

You probably would have a much better shot at getting access to Llama than I ever would 😂😂😂

conceptofmind commented 1 year ago

@lucidrains Meta had a change of heart and decided to release Llama to a bunch of academic researchers. Maybe has something to do with the chatGPT API going out today? No longer gated to a few. Still only for research purposes but definitely a good sign. Hopefully, they will release a commercial license soon. Downloading weights now if you ever want them.