sakamomo554101 commented 2 years ago

概要

VertexAIのモデルモニタリングを試しに導入する。手順などをドキュメントにまとめる + スクリプトを作成する。

sakamomo554101 commented 2 years ago

45 の対応後に着手可能。

VertexAI Prediction APIを用いて、モニタリングができるため。

sakamomo554101 commented 2 years ago

https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_monitoring/model_monitoring.ipynb 上記が参考になりそう。

sakamomo554101 commented 2 years ago

タイムアウト処理必要だなぁ・・。ダッシュボード側がレスポンスないと、ずっとグルグルしてしまう

sakamomo554101 commented 2 years ago

ModelDeploymentMonitoringJobを使って、モニターを構築すれば良い？

sakamomo554101 commented 2 years ago

pipeline上でモニタリングを構築するオペレーターを追加できんかなー

sakamomo554101 commented 2 years ago

https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai お、これ参考になりそう。

眺めてみる

sakamomo554101 commented 2 years ago

https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai/blob/main/08-model-monitoring.ipynb 上記でモニタリングを作成している

sakamomo554101 commented 2 years ago

https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ModelDeploymentMonitoringObjectiveConfig

ModelMonitorのコンフィグは上記っぽい。上記にskewのパラメーターとかを入れていくように見える。

sakamomo554101 commented 2 years ago

https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ModelDeploymentMonitoringJob

上記でモニターを作成可能。上記に入れるパラメーター（先程のコンフィグも含まれる。他にはアラート設定やモニタリング期間など）を解釈すれば、モニタリングが作れそう。

sakamomo554101 commented 2 years ago

エンドポイントに紐づいたモデル情報を取る際に、EndpointServiceClientが必要。モニタリングサービスを開始する際に、JobServiceClientが必要。

https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.endpoint_service.EndpointServiceClient https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.job_service.JobServiceClient

sakamomo554101 commented 2 years ago

なるほど、すでに同一エンドポイントにモニタリング設定している状態で、再度モニタリングを設定しようとすると、エラーになるっぽい。

sakamomo554101 commented 2 years ago

モニタリングがあるかのチェックか上書き対応ができるかを調査したほうが良さそう。

sakamomo554101 commented 2 years ago

サンプルコードが参考になる。

def list_monitoring_jobs():
    client_options = dict(api_endpoint=API_ENDPOINT)
    parent = f"projects/{PROJECT_ID}/locations/us-central1"
    client = JobServiceClient(client_options=client_options)
    response = client.list_model_deployment_monitoring_jobs(parent=parent)
    print(response)

def delete_monitoring_job(job):
    client_options = dict(api_endpoint=API_ENDPOINT)
    client = JobServiceClient(client_options=client_options)
    response = client.delete_model_deployment_monitoring_job(name=job)
    print(response)

sakamomo554101 commented 2 years ago

どうやって、モニタリングの存在判定をするか

sakamomo554101 commented 2 years ago

https://googleapis.dev/python/aiplatform/latest/aiplatform_v1/job_service.html#google.cloud.aiplatform_v1.services.job_service.JobServiceClient.create_model_deployment_monitoring_job

updateもある。 https://googleapis.dev/python/aiplatform/latest/aiplatform_v1/job_service.html#google.cloud.aiplatform_v1.services.job_service.JobServiceClient.update_model_deployment_monitoring_job

sakamomo554101 commented 2 years ago

list_model_deployment_monitoring_jobsを実行すると、 ModelDeploymentMonitoringJobのリストが返ってくるっぽい

sakamomo554101 commented 2 years ago

多分、同一エンドポイント名が存在するかを見れば良いかなー

sakamomo554101 commented 2 years ago

https://googleapis.dev/python/aiplatform/latest/aiplatform_v1/types.html?highlight=modeldeploymentmonitoringjob#google.cloud.aiplatform_v1.types.ModelDeploymentMonitoringJob.endpoint 上記でendpointが取れる。

sakamomo554101 commented 2 years ago

うーん、、下記のようにFieldMaskを指定しても、エラーになってしまう。

response = client.update_model_deployment_monitoring_job(
    model_deployment_monitoring_job=job, update_mask=FieldMask(paths=["*"])
)

sakamomo554101 commented 2 years ago

エラーが謎なので、deleteして、createを試してみる

sakamomo554101 commented 2 years ago

https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai/blob/f049abdf3e9837c32ee3b6faea82a2202d14581b/08-model-monitoring.ipynb

sakamomo554101 commented 2 years ago

なるほど、ModelDeploymentMonitoringJob.nameはすでにデプロイ済みのjobから取得する必要がありそう。 ※デプロイ前だと、job.nameが空文字だった。

sakamomo554101 commented 2 years ago

https://googleapis.dev/python/aiplatform/latest/aiplatform_v1/types.html#google.cloud.aiplatform_v1.types.ModelDeploymentMonitoringJob.name 上記みても、デプロイ前後でどうか？は記載がないけどもね・・。

sakamomo554101 commented 2 years ago

あ、そういえば、ドリフト（スキューかな）検知で利用するデータソースはcsv形式でもいいのかしら

sakamomo554101 commented 2 years ago

https://googleapis.dev/java/google-cloud-aiplatform/1.0.0/com/google/cloud/aiplatform/v1beta1/ModelMonitoringObjectiveConfig.TrainingDataset.html

BigquerySourceではなく、GcsSource使えばいいのかも。

sakamomo554101 commented 2 years ago

artifactのuriを取得すれば、そこに文字列が格納されてるのかも。（endpointをカスタムコンポーネントから返す場合の取得方法）

sakamomo554101 commented 2 years ago

いや、無理か。outputPathに指定しちゃうと、Vertex側でパスが勝手に設定されちゃうんだよな。

sakamomo554101 commented 2 years ago

https://github.com/boostcampaitech2/final-project-level3-cv-15/blob/6ffb7ff96e678f358dd6595412d3f4b3f10cc049/serving/google-cloud-sdk/lib/googlecloudsdk/api_lib/ai/model_monitoring_jobs/client.py#L260-L262

上記見ると、urisにgcsのuriのリストを渡せば良い？

sakamomo554101 commented 2 years ago

https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ModelMonitoringObjectiveConfig.TrainingDataset

TrainingDatasetについては、上記見ればわかりそう。

sakamomo554101 commented 2 years ago

https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.GcsSource

あー、やっぱ、urisにgcsのfile uriのリスト渡せばOKだな。

sakamomo554101 commented 2 years ago

gcsのuriのリスト（training, validationあたり）を取得する必要があるな。

sakamomo554101 commented 2 years ago

あれ、モデル側で、予測対象って、どう決めてたっけ？特に決めてないのか

sakamomo554101 commented 2 years ago

学習時に決めてるはず

sakamomo554101 commented 2 years ago

https://github.com/sakamomo554101/YouyakuAI/blob/master/model_pipeline/components/trainer/src/dataset.py#L79-L91

csvにカラム名は書いてなくて、列数でターゲットを指定してる。 titleがターゲットなので、1列目（0指定）となる。

sakamomo554101 commented 2 years ago

とりあえず、prediction_target_fieldには0入れてみるけど、ダメだったら、csvに列名入れるか。

sakamomo554101 commented 2 years ago

pipelineのビルドでこける。なんだっけ、これ。ListでpipelineParamsを受け取っているのが問題？

Traceback (most recent call last):
  File "model_pipeline/pipeline.py", line 844, in <module>
    pipeline_instance.compile_pipeline(
  File "model_pipeline/pipeline.py", line 280, in compile_pipeline
    v2_compiler.Compiler().compile(
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/kfp/v2/compiler/compiler.py", line 1274, in compile
    pipeline_job = self._create_pipeline_v2(
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/kfp/v2/compiler/compiler.py", line 1196, in _create_pipeline_v2
    pipeline_func(*args_list)
  File "model_pipeline/pipeline.py", line 382, in kfp_youyakuai_pipeline
    monitoring_op = self.create_monitoring_op(
  File "model_pipeline/pipeline.py", line 717, in create_monitoring_op
    return monitoring_func(
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/kfp/components/_dynamic.py", line 53, in Monitoring func
    return dict_func(locals())  # noqa: F821 TODO
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/kfp/components/_components.py", line 386, in create_task_object_from_component_and_pythonic_arguments
    return _create_task_object_from_component_and_arguments(
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/kfp/components/_components.py", line 323, in _create_task_object_from_component_and_arguments
    task = _container_task_constructor(
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/kfp/dsl/_component_bridge.py", line 319, in _create_container_op_from_component_and_arguments
    _attach_v2_specs(task, component_spec, original_arguments)
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/kfp/dsl/_component_bridge.py", line 593, in _attach_v2_specs
    json.dumps(argument_value))
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type PipelineParam is not JSON serializable

sakamomo554101 commented 2 years ago

dataset_gcs_source_uris=dataset_gcs_source_uris の所をコメントアウトすると、コンパイルできるから、List指定が問題そう。

sakamomo554101 commented 2 years ago

https://stackoverflow.com/questions/70356856/vertex-ai-model-batch-prediction-issue-with-referencing-existing-model-and-inpu

上記、同じエラーに遭遇してそう。

sakamomo554101 commented 2 years ago

ふーむ、、コンバーターを作ると良い（= componentアノテーターなりを使って）、というイメージか。

sakamomo554101 commented 2 years ago

んー、結局、PipelineParamがlistに入らんから、同じ問題がおきるな。

sakamomo554101 commented 2 years ago

多分、エンドポイントにモデルがアタッチされてないから、と思ったが、下記のエラーはなんだろ？

Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob

sakamomo554101 commented 2 years ago

あー、model_idが取れなかったからか。やはり、エンドポイントにモデルが紐づいてないのが原因。

sakamomo554101 commented 2 years ago

んー、モデルモニタリングのデプロイタスクは成功するが、実際にモニタリングは開始されていない。通知先のメールアドレスにデプロイが失敗した旨のメールが来てた。

sakamomo554101 commented 2 years ago

trainingDatasetに対して、何も設定できていなかったのが問題。 GcsSourceを設定するようにしたら、モニタリングが動くようになった。ただ、データフォーマットが見えてないのは大丈夫なのか？

2022-02-12T09:37:52.320125961Zmodel_deployment_monitoring_objective_configs {
情報
2022-02-12T09:37:52.320131405Z deployed_model_id: "5768856035963961344"
情報
2022-02-12T09:37:52.320136939Z objective_config {
情報
2022-02-12T09:37:52.320142387Z training_dataset {
情報
2022-02-12T09:37:52.320147521Z data_format: "data-format-unspecified"
情報
2022-02-12T09:37:52.320153215Z gcs_source {
情報
2022-02-12T09:37:52.320158741Z uris: "gs://youyaku_ai_pipeline/pipeline_output/749925056555/youyaku-ai-pipeline-20220212172750/data-generator_1178458761673572352/train_data"
情報
2022-02-12T09:37:52.320164814Z }
情報
2022-02-12T09:37:52.320170270Z target_field: "0"
情報
2022-02-12T09:37:52.320175932Z }

sakamomo554101 commented 2 years ago

https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ModelMonitoringObjectiveConfig.TrainingDataset

data_format
Data format of the dataset, only applicable if the input is from Google Cloud Storage. The possible formats are: "tf-record" The source file is a TFRecord file. "csv" The source file is a CSV file.

なるほど、csv指定するか（csvなので）

sakamomo554101 commented 2 years ago

残件

モニタリング時のcsvファイルについて、列番号指定が良いのか？を確認する（ダメな場合は、csvに対して、カラム名を設定する必要がある）
コミット整理
マージ

sakamomo554101 commented 2 years ago

下記のようなエラーが出ているから、ちゃんとラベル（入力のパラメーターと合わせる）を設定する必要があるな。

Error message:
Unexpected Feature Name input_text

sakamomo554101 commented 2 years ago

あ、モニタリングが途中で無効化されてる。やはりちゃんとモニタリング出来ていない、ということか。

sakamomo554101 commented 2 years ago

もうbqに流し込んじゃおうかな。もしくは、csvにカラム名（title = target, body = input_textとする。genreは削除する）を設定するか。

sakamomo554101 commented 2 years ago

https://qiita.com/komiya_____/items/8fd900006bbb2ebeb8b8

お、to_gpqなんてあるのか。

sakamomo554101 / YouyakuAI

VertexAIのモデルモニタリングを試しに導入する #46

概要

45 の対応後に着手可能。

残件