xp1632 / DFKI_working_log

0 stars 0 forks source link

LLM+Visual Programming_Technical Courses #64

Open xp1632 opened 2 months ago

xp1632 commented 2 months ago
  1. ChainForge https://chainforge.ai/ , https://github.com/ianarawjo/ChainForge ChainForge is an open-source visual programming environment for prompt engineering. With ChainForge, you can evaluate the robustness of prompts and text generation models in a way that goes beyond anecdotal evidence.

  2. Low-Code LLM Low-code LLM: Graphical User Interface over Large Language Models https://www.semanticscholar.org/paper/Low-code-LLM%3A-Graphical-User-Interface-over-Large-Cai-Mao/490776b4c01b5950275a3541183f6b9e3818c207

xp1632 commented 1 month ago

LLM specific courses

xp1632 commented 1 month ago

Course 1: Finetunning LLM:

Course Link: https://learn.deeplearning.ai/courses/finetuning-large-language-models/lesson/5/data-preparation Library for Lamini: https://lamini-ai.github.io/tuning/quick_start/#basic-tuning


Course Notes:

2.1 why finetuning LLM:


2.2 what finetuning brings?



2.3 Where does finetuning fit in?


2.4 Task to finetuning

- Extraction of texts --->get keywords
- or Expansion of information  ----> writing such as emails, code

2.5 Steps in First time finetuning


2.6 Instruction-finetuning

2.6.1 Two types of instruction prompt templates


2.7 Data prepration for training:

import pandas as pd
import datasets

from pprint import pprint
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m")
text = "Hi, how are you?"
encoded_text = tokenizer(text)["input_ids"]
// encoded_text : [12764, 13, 849, 403, 368, 32]

decoded_text = tokenizer.decode(encoded_text)
print("Decoded tokens back into text: ", decoded_text)
// Decoded tokens back into text:  Hi, how are you?

list_texts = ["Hi, how are you?", "I'm good", "Yes"]
encoded_texts = tokenizer(list_texts)
print("Encoded several texts: ", encoded_texts["input_ids"])
// Encoded several texts:  [[12764, 13, 849, 403, 368, 32], [42, 1353, 1175], [4374]]
encoded_texts_both = tokenizer(list_texts, max_length=3, truncation=True, padding=True)
print("Using both padding and truncation: ", encoded_texts_both["input_ids"])

// Using both padding and truncation:  [[403, 368, 32], [42, 1353, 1175], [4374, 0, 0]]

2.8 Training dataset


def inference(text, model, tokenizer, max_input_tokens=1000, max_output_tokens=100):
  # Tokenize
  input_ids = tokenizer.encode(
          text,
          return_tensors="pt",
          truncation=True,
          max_length=max_input_tokens
  )

  # Generate
  device = model.device
  generated_tokens_with_prompt = model.generate(
    input_ids=input_ids.to(device),
    max_length=max_output_tokens
  )

  # Decode
  generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)

  # Strip the prompt
  generated_text_answer = generated_text_with_prompt[0][len(text):]

  return generated_text_answer
trainer = Trainer(
    model=base_model,
    model_flops=model_flops,
    total_steps=max_steps,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)
- after 3 steps, the model is not fully trained 
- this slightly-fine tuned model doesn't provide great result
- thus we further train the entire dataset twice and get a better result

2.9 Evaluation and iteration


image image image


Parameter- Efficient Finetuning

https://github.com/huggingface/peft image


LoRA to train less parameters:

https://huggingface.co/docs/peft/main/en/conceptual_guides/lora image

xp1632 commented 3 weeks ago

Course 2: Generative AI with Large Language Models

Course Link: https://www.coursera.org/learn/generative-ai-with-llms/lecture/9uWab/course-introduction


Week1 - 1. Generative AI and LLMs


Important Concept of how Transformerworks:

https://www.coursera.org/learn/generative-ai-with-llms/lecture/3AqWI/transformers-architecture

image

image

-Step5: Feed all intention weight we get from the dataset to a fully connected feed-forward network then to a softmax layer and get the probability of each word image


xp1632 commented 3 weeks ago

Generating text with transformers

-Step2 Encoded Token IDs are passed to the embedding layer then add positional information

image

-Step7: We continue this loop by passing the output token again to the input to trigger next token: image

-Step8: : the final output token is detokenized and we get the output sequence:

image


Creativity of the Result depends on different ways to choose predicted next token from the softmax layer list


image


We can split the Encoder and Decoder in Transformer


image


xp1632 commented 3 weeks ago

Foundation paper of transformer: Attention is all you need:

pdf: https://arxiv.org/pdf/1706.03762


"Attention is All You Need" is a research paper published in 2017 by Google researchers, which introduced the Transformer model, a novel architecture that revolutionized the field of natural language processing (NLP) and became the basis for the LLMs we now know - such as GPT, PaLM and others. The paper proposes a neural network architecture that replaces traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs) with an entirely attention-based mechanism.

The Transformer model uses self-attention to compute representations of input sequences, which allows it to capture long-term dependencies and parallelize computation effectively. The authors demonstrate that their model achieves state-of-the-art performance on several machine translation tasks and outperforms previous models that rely on RNNs or CNNs.

The Transformer architecture consists of an encoder and a decoder, each of which is composed of several layers. Each layer consists of two sub-layers: a multi-head self-attention mechanism and a feed-forward neural network. The multi-head self-attention mechanism allows the model to attend to different parts of the input sequence, while the feed-forward network applies a point-wise fully connected layer to each position separately and identically.

The Transformer model also uses residual connections and layer normalization to facilitate training and prevent overfitting. In addition, the authors introduce a positional encoding scheme that encodes the position of each token in the input sequence, enabling the model to capture the order of the sequence without the need for recurrent or convolutional operations.


xp1632 commented 3 weeks ago

Prompt Engineering

In-Context Learning: Providing examples inside the context window


xp1632 commented 3 weeks ago

Generative configurations




Generative configuration: different sampling scheme: top-k and top-p

image


Generative configuration: temperature

image

xp1632 commented 3 weeks ago

Steps in Generative AI project lifecycle


-Good at many tasks or one specific type of task image

image


image


image


image


Limitation of LLM such as hallucination and not able to deal with complex math and solutions

xp1632 commented 3 weeks ago

Generative AI Use case AWS labs

xp1632 commented 2 weeks ago

Pre-training LLM:


How to choose a pre-trained model?


How are LLM models trained (pre-training)?

  1. Via self-supervised learning of TB or more of unstructured textual data from many sources(Internet or specific text corpus(语料库)),
    • LLM model learns a deep statistical representation of language
    • LLM model internalizes the patterns and structure in the language

image

  1. During pertaining, the model weights are updated to minimize the loss of training object
    • For each token, the encoder generates a respective TokenID and Vector Representation

image

Pretraining requires a large amount of GPU and computing

xp1632 commented 2 weeks ago

Different training for different type of models which carry out different tasks:

image


Encoder Only Models

image

Tokens in the sequence are randomly masked The training objective(goal) is to reconstruct the original sentence (or we call it denoising) image

- BERT (Bidirectional Encoder Representations from Transformers)


Decoder Only Models:

image

image

image


Encoder-Decoder Models

image

xp1632 commented 2 weeks ago

Comparison of model architectures and pre-training objectives

image

xp1632 commented 2 weeks ago

Computational challenges of training LLMs



image image



Conclusion: we can scale down the size of the model by quantization

image

xp1632 commented 2 weeks ago

Multi-GPU compute strategies

image


Distributed Data Parallel (DPP)


Fully Sharded(分区) Data Parallel(FSDP)


Sharding factor in FSDP

image


Performance comparison on different sizes of model

xp1632 commented 2 weeks ago

How big a model we need it to be? - Scaling laws and compute-optimal models


Number of petaflop/s -days to pre-train various LLMs

image


Optimal Strategy and power law

image


image image

image

image

Conclusion from Chinchilla : Compute optimal dataseize = 20 * model paramters

image

xp1632 commented 2 weeks ago

!!! Pre-training for domain adaptation:


BloombergGPT: finance domain adapted LLM model

image

xp1632 commented 2 weeks ago

Instruction Fine-tuning

image


image image


Difference between LLM pre-training and fine-tuning


How instruction fine-tunning works

image


image


!!!

How to prepare instruction-based training data for fine-tuning?


xp1632 commented 2 weeks ago

Process of instruction LLM fine-tuning

  1. First we get our prepared instruction dataset:

  2. We divide the dataset into training, validation, test image

  3. During fine-tuning, prompts from the training dataset are passed to LLM, LLM generates completions and we compare the predicted result with the label specified in the training data

image

  1. We compute the loss based on the different probability distributions of two tokens image

  2. We use the calculated loss to update the weights of LLM model in standard backpropagation image

  3. We do this for many batches of prompt- completion pairs and over several epochs to update the weights so the model's performance on the task improves

  4. Measure LLM performance by the holdout validation dataset and calculate validation accuracy image

  5. After completing fine-tuning, perform a final performance evaluation using test dataset and get test_accuracy image


xp1632 commented 2 weeks ago

Fine-tuning on a single task:

image


Catastrophic forgetting issue


How to avoid catastrophic forgetting

image

xp1632 commented 2 weeks ago

Multi-task, instruction fine-tuning


FLAN Instruction fine-tuning model

image


FLAN-T5 is trained across 473 datasets and 146 task categories

image


SAMsum: sample dialogue prompt training dataset to fine-tune FLAN-T5

image


xp1632 commented 2 weeks ago

How to improve specifically the performance of FLAN-t5 on summarization task



Before fine-tuning of FLAN-T5 with dialogesum

After fine-tuning of FLAN-T5 with dialogesum

image