VertexAIについて調べる

sakamomo554101 commented 3 years ago

GCPのVertexAIについて調べる

sakamomo554101 commented 3 years ago

https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform?hl=ja

とりあえず上記ドキュメントからざっと読む

sakamomo554101 commented 3 years ago

データセットを作成してデータをアップロードする。
データで ML モデルをトレーニングする。
- モデルのトレーニング
- モデルの精度の評価
- ハイパーパラメータの調整（カスタムトレーニングのみ）
モデルをアップロードして Vertex AI に保存する。
予測を行うエンドポイントに、トレーニング済みのモデルをデプロイする。
予測リクエストをエンドポイントに送信する。
エンドポイントで予測のトラフィック分割を指定する。
モデルとエンドポイントを管理する。

SageMakerでいうところのDebuggerみたいなのはない？ ※Monitorはありそう

sakamomo554101 commented 3 years ago

Feature Storeはどんなデータレイクになっているかが気になるな。ラベリング（教師データ）作成はアノテーションツールだが、どの程度自由度（例えば、画像のセグメンテーションは対応しているとか）があるのかが気になる。

sakamomo554101 commented 3 years ago

Cloud Logging や Cloud Monitoring などの便利なツールに接続されています。

なるほど、Loggingとつなげて、Debbugingできそう。

sakamomo554101 commented 3 years ago

https://cloud.google.com/tensorflow-enterprise/docs?hl=ja

え、Tensorflowって、Enterprizeできたの？

sakamomo554101 commented 3 years ago

VertexAIって、GCPのAutoMLとAI Platformを統合した感じなのかな？（今更だけど）

sakamomo554101 commented 3 years ago

https://cloud.google.com/vertex-ai/pricing VertexAIの料金体系

そういえば、GCPのインスタンスタイプが良くわからんから、調べておく必要があるな。

sakamomo554101 commented 3 years ago

なるほど、下記のように、JSONフォーマットで学習データを作れば良いみたい。（下記はドキュメントの単一分類タスク） https://cloud.google.com/vertex-ai/docs/datasets/prepare-text?hl=ja

textContentは実際の文字列（ドキュメント）を入れる
textGcsUriはGCS（Google Cloud Storage）にあるドキュメントURIを指定する

{
  "classificationAnnotations": {
    "displayName": "label"
  },
  "textContent": "inline_text",
  "dataItemResourceLabels": {
    "aiplatform.googleapis.com/ml_use": "training|test|validation"
  }
},
{
  "classificationAnnotations": {
    "displayName": "label2"
  },
  "textGcsUri": "gcs_uri_to_file",
  "dataItemResourceLabels": {
    "aiplatform.googleapis.com/ml_use": "training|test|validation"
  }
}

sakamomo554101 commented 3 years ago

https://cloud.google.com/vertex-ai/docs/start/automl-model-types?hl=ja VertexAIで対応しているモデルタイプ。

割と幅広い。（画像、動画、テキスト、テーブル）ただ、画像・動画・テキストはシンプルな分類やオブジェクト検出、感情分析（肯定的・否定的な文章の抜き出し）を行うタスクとなっており、タスク難易度としては易しい。

sakamomo554101 commented 3 years ago

テーブルデータはBigQuery経由でも取得可能な感じだね。

sakamomo554101 commented 3 years ago

https://cloud.google.com/vertex-ai/docs/explainable-ai/overview?hl=ja へえー、プレビュー版だけど、特徴量選択・評価みたいなことも出来るのか。（どう評価をしているかは分からんが）

sakamomo554101 commented 3 years ago

https://cloud.google.com/vertex-ai/docs/model-monitoring/overview?hl=ja

モニタリング周り

sakamomo554101 commented 3 years ago

https://cloud.google.com/vertex-ai/docs/model-monitoring/overview?hl=ja#skew-and-drift

トレーニング/サービングスキュー
- トレーニングデータと本番環境でのデータの分散（分布情報と捉えても良いかも）が異なるかどうかを検知
- トレーニングデータが必要
予測ドリフト
- 本番環境でのデータ分布が時間経過で異なっていった場合を検知
- トレーニングデータが不要

上記がモニタリング時のドリフト検知として使える。

sakamomo554101 commented 3 years ago

https://cloud.google.com/vertex-ai/docs/model-monitoring/overview?hl=ja#calculating-skew-and-drift

基本的にスキューも予測ドリフトも、ベースラインとなる分布との比較を行っている感じか。 ※このベースラインが、スキューと予測ドリフトで異なる分布となるイメージ。

sakamomo554101 commented 3 years ago

https://cloud.google.com/vertex-ai/docs/model-monitoring/overview?hl=ja

分布の距離判定は下記を利用

ジェンセン・シャノンダイバージェンス
チェビシェフ距離

sakamomo554101 commented 3 years ago

モニタリングでは、スキュー検出とドリフト検出を明示的に分けて使ってる

sakamomo554101 commented 3 years ago

ただ、一般的にというかほぼ入力データについては、多変量データ（複数の特徴量）となるはずだから、距離検出は2変量とはならんはず。

次元圧縮なりをしている？それとも多次元分布での距離検出をする？ここらへんが分からん

sakamomo554101 commented 3 years ago

https://cloud.google.com/blog/ja/topics/developers-practitioners/kickstart-your-organizations-ml-application-development-flywheel-vertex-feature-store

sakamomo554101 commented 3 years ago

https://github.com/sakamomo554101/study/issues/13#issuecomment-913146412 上記に書いたように列単位でドリフトを見るやり方もあるかも。

sakamomo554101 commented 3 years ago

https://cloud.google.com/blog/ja/topics/developers-practitioners/kickstart-your-organizations-ml-application-development-flywheel-vertex-feature-store

この記事はとても重要。（Feature Storeを利用したデータドリフト検知の話も含まれる）

sakamomo554101 commented 3 years ago

Point-in-time検索はなんとなく分かるような・・

sakamomo554101 commented 3 years ago

★モニタリングのスキュー・ドリフト検知する際の詳細をちょっと調べたいところ。 ※どのように特徴量を保存する必要があるのか、が気になっていて、さらにはどのような入力データでも対応できるのか（-> 中間の特徴量データに変換してしまえば良いと思っている）が知りたい。

sakamomo554101 commented 3 years ago

http://ibisforest.org/index.php?Jensen-Shannon%E3%83%80%E3%82%A4%E3%83%90%E3%83%BC%E3%82%B8%E3%82%A7%E3%83%B3%E3%82%B9

D_JSは、D_KLの計算結果を用いているのか（D_KL -> カルバックライブラー　ダイバージェンス）

sakamomo554101 commented 2 years ago

https://cloud.google.com/blog/ja/topics/developers-practitioners/monitor-models-training-serving-skew-vertex-ai 読む

sakamomo554101 commented 2 years ago

https://twitter.com/googlecloudtech/status/1435347026905030659 見る（パイプラインを作る際に参考になる）

sakamomo554101 commented 2 years ago

https://cloud.google.com/architecture/ml-on-gcp-best-practices ベストプラクティスは読んでおきたい。

sakamomo554101 commented 2 years ago

https://cloud.google.com/vertex-ai/docs/training/code-requirements?hl=ja そういえば、カスタムトレーニング周り読んでないな。読んどこ。

sakamomo554101 commented 2 years ago

pipelineについて https://cloud.google.com/vertex-ai/docs/pipelines/introduction?hl=ja

sakamomo554101 commented 2 years ago

pipeline構築のイメージ https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline?hl=ja

GCのプロジェクト作って、その中でパイプラインを構築する。 ※コードみると、イメージしやすい。

kubeflow使うのが楽なのかなぁー

sakamomo554101 commented 2 years ago

https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/official/pipelines/google-cloud-pipeline-components_automl_tabular.ipynb

上記のようなコード例（線形回帰での分類例。テーブルデータを利用）をみておいた方がよさそう。データソースはGSやBigQueryが選べる。（構造化データはBQだし、非構造化はGSみたいな棲み分けですね）

sakamomo554101 commented 2 years ago

サンプルコード周り https://github.com/GoogleCloudPlatform/ai-platform-samples https://github.com/GoogleCloudPlatform/vertex-ai-samples

sakamomo554101 commented 2 years ago

https://tech.repro.io/entry/2021/06/22/125113

sakamomo554101 commented 2 years ago

https://cloud.google.com/bigquery-ml/docs/ BigQueryMLなんてあるんかい。

sakamomo554101 commented 2 years ago

https://cloud.google.com/vertex-ai/docs/general/notebooks VertexAIでNotebooksを利用する場合

sakamomo554101 commented 2 years ago

https://cloud.google.com/vertex-ai/docs/datasets/label-using-console VertexAIでのラベル付について。

ラベラーと呼ばれる人にラベル付を依頼することも可能。 https://cloud.google.com/vertex-ai/docs/datasets/data-labeling-job

ただ、自前でやる場合にラベル付がどの程度高機能かは不明（要調査）

sakamomo554101 commented 2 years ago

https://cloud.google.com/vertex-ai/docs/featurestore/overview featurestore / Feature Storeについて

sakamomo554101 commented 2 years ago

Feature Storeについて

featurestoreという枠を作って、プロジェクト間で特徴量を共有するイメージ
Entityは固有IDみたいなものと考えれば良い
特徴に対して、特徴量は（時系列に沿って）複数の値を格納することが可能（例えば、Aという日のratingは4.4だったが、Bという日のratingは4.8というように複数の特徴量を入れることが出来る）

sakamomo554101 commented 2 years ago

https://qiita.com/noko_qii/items/1c55261f08ce7e95b255 これ、なかなか良い記事。

sakamomo554101 commented 2 years ago

https://googleapis.dev/python/aiplatform/latest/aiplatform.html?highlight=model

AIP_STORAGE_URIとAIP_MODEL_DIRは何が違う？

sakamomo554101 / study

VertexAIについて調べる #10

Feature Storeについて