nasa-petal / bio-strategy-extractor

The Unlicense
4 stars 1 forks source link

Compile a dataset for fine-tuning GPT #12

Open bruffridge opened 2 years ago

bruffridge commented 2 years ago

Expected data format: https://beta.openai.com/docs/guides/fine-tuning

Get summary and taxonomy data for biological strategies from AskNature API. Titles and abstracts will have to be retrieved from a separate sources.

Compile a JSONL file with each biological strategy as a line in the file in this format:

{"prompt”:”<title>\n<abstract>\n\n###\n\n”, "completion”:”<summary>|||<functions>|||<systems> ###"}

Example completion:

Tiny leg hairs help spiders sense subtle air movement by conducting it directly to nerve cells.|||Process Information|Sense Signals/Environmental Cues|Sense Touch and Mechanical Forces in a Living System|Sense Sound and Other Vibrations From the Environment|Sense Motion|Process Information|||Animals|Arthropods (Insects, Spiders, Crustaceans)|Spiders|Arachnids ###

abalai-ash commented 2 years ago

@bruffridge @hschilling Here are a few tutorials on how to use GPT-J on AWS:

  1. The set up is described here: https://towardsdatascience.com/how-to-build-your-own-gpt-j-playground-733f4f1246e5
  2. This experiments with a few NLP tasks: https://towardsdatascience.com/how-to-use-gpt-j-for-almost-any-nlp-task-cb3ca8ff5826
  3. Here GPT-J is explained w/ a few examples (Hugging Face): https://huggingface.co/docs/transformers/model_doc/gptj
  4. Here is a google collab demo: https://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb
  5. I looked at this briefly, but it goes into fair amount of detail on how to use the repo (https://github.com/mallorbc/gpt-j-6b): https://www.youtube.com/watch?v=ym6mWwt85iQ
  6. This is the fine-tuning video by the same person as the above video: https://www.youtube.com/watch?v=bLMbnHunL_E
abalai-ash commented 2 years ago

I will go ahead and get started on this assigned task.

abalai-ash commented 2 years ago

For GPT-3: https://www.youtube.com/watch?v=8psgEDhT1MM

bruffridge commented 2 years ago

This might be helpful: https://betterprogramming.pub/fine-tuning-gpt-j-6b-on-google-colab-or-equivalent-desktop-or-server-gpu-b6dc849cb205