LLMの調査 - Githubissues

KateSawada commented 10 months ago

使えそうなやつここに書いてほしい!!!

Lucas544875 commented 10 months ago

huggingface transformersで使える日本語モデルのまとめ https://tech.yellowback.net/posts/transformers-japanese-models

rinnaの使い方(「rinna/bilingual-gpt-neox-4b-instruction-ppo」を8bit量子化) https://note.com/npaka/n/na4eb6bad2246 VRAM　6.1GB

KateSawada commented 10 months ago

追加で，Azureのリソースも調べてほしい

GPUマシンのクラウドリソース
言語モデルをそのまま提供するやつこの2つがあったと思うんで，両面から調査をしてほしいっす!!

Lucas544875 commented 10 months ago

Azure Machine Learningのいろは https://qiita.com/gnbrganchan/items/43e6c44754cb83220db5 データとモデルをセットして学習させられるAzureのサービス。学習から始めるのでハッカソン向きではないかも

Lucas544875 commented 10 months ago

Azure AI Language(旧 Azure Cognitive Services) https://www.softbanktech.co.jp/special/blog/dx_station/2022/0003/ AIチャット、固有表現抽出などの機能をAPIで呼び出せる。ただし似たようなサービスがgoolab api (スポンサード)で提供されていそう？機能一覧

名前付きエンティティの認識
個人を特定できる情報 (PII) と健康に関する情報 (PHI) の検出
言語検出
感情分析とオピニオンマイニング
テキスト要約
キーフレーズの抽出
テキストにWikipediaへのリンクを自動付与
医療情報を抽出、ラベル付け
カスタムテキスト分類
カスタム固有表現認識
会話言語理解
チャット

Lucas544875 commented 10 months ago

(Azure Virtual Machine)クラウドリソースgpu

Azure Virtual Machineの詳細な使い方 https://www.idaten.ne.jp/portal/page/out/secolumn/multicloud/column032.html
gpu の料金表 https://note.com/sa1p/n/n0485cd0d8a04 ￥91~￥208/h 程度でVRAM 16G のgpuが使える￥682/h　まで出せばVRAM 80Gのgpuが使える
pytorchをインストールするまで https://qiita.com/yushikmr/items/c3bddc1e21d19a848a19

KateSawada commented 10 months ago

中間発表の時点で結論を出す

Lucas544875 commented 10 months ago

https://huggingface.co/izumi-lab/stormy-7b-10ep

KateSawada commented 10 months ago

↑stormy，用意したサーバーでうごかんかった… rinna使えるようにしとこうと思います

KateSawada commented 10 months ago

結論，これ使う

rinnaの使い方(「rinna/bilingual-gpt-neox-4b-instruction-ppo」を8bit量子化) https://note.com/npaka/n/na4eb6bad2246 VRAM　6.1GB

↓ldサーバーで動かしたやつ．動かすために， $ pip install sentencepiece==0.1.99 accelerate==0.24.0 bitsandbytes==0.41.1 scipy==1.11.3 が必要だった．このあたりを，requirements.txtに追記する必要あり．

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# トークナイザーとモデルの準備
tokenizer = AutoTokenizer.from_pretrained(
    "rinna/bilingual-gpt-neox-4b-instruction-ppo",
    use_fast=False
)

print("tokenizer OK")
model = AutoModelForCausalLM.from_pretrained(
    "rinna/bilingual-gpt-neox-4b-instruction-ppo",
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

print("model OK")
print(model.device)
# プロンプトの準備
prompt = """ユーザー: まどか☆マギカでは誰が一番かわいい？その理由も説明して。
システム: """

# 推論の実行
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
    output_ids = model.generate(
        token_ids.to(model.device),
        max_new_tokens=512,
        do_sample=True,
        temperature=1.0,
        top_p=0.85,
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1):])
print(output)

出力

$ python rinna.py
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
tokenizer OK
model OK
cuda:0
それは素晴らしい質問です!私は鹿目まどかがとても好きで、彼女はとても可愛らしいです。彼女の魅力は、彼女の可愛らしい、そして少し変わった性格にあります。また、彼女の魅力的な笑顔や、彼女の芯の強さも好きです。</s>

jphacks / NG_2308

LLMの調査 #18