kedroのドキュメントをざっと読む - Githubissues

sakamomo554101 / study

勉強用のリポジトリ（記事とかのリンクを貼ったりしていく）

0 stars 0 forks source link

kedroのドキュメントをざっと読む #4

Open sakamomo554101 opened 3 years ago

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/02_get_started/04_new_project.html

上記から読み始める

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/02_get_started/05_example_project.html#conf-local

なるほど、conf localでbaseのパラメーターをオーバーライドする感じっぽいな。

↓

オーバーライドされるかわからんが、gitignoreなため、credentialとかを入れるのに適しているっぽい。

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/07_extend_kedro/05_create_kedro_starters.html

へぇー、kedro starterを作ることもできるのか

sakamomo554101 commented 3 years ago

https://github.com/quantumblacklabs/kedro-starters/tree/master/spaceflights

spaceflightsのテンプレートコードは上記にある

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/01_spaceflights_tutorial.html

kedroを用いたプロセスが丁寧に書いてありそう

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/01_spaceflights_tutorial.html#package-the-project

Build the project documentation
Package the project for distribution

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/02_tutorial_template.html#install-project-dependencies-with-kedro-install

kedro installでプロジェクトに必要な依存パッケージを追加できる

sakamomo554101 commented 3 years ago

ちょっと話それるが、AWSのブログでもKedro紹介されてるんだなー

https://aws.amazon.com/blogs/opensource/using-kedro-pipelines-to-train-amazon-sagemaker-models/

sakamomo554101 commented 3 years ago

https://www.quantumblack.com/

kedroのownerだが、マッキンゼー配下のDA集団な感じなのか。

sakamomo554101 commented 3 years ago

https://speakerdeck.com/921kiyo/odsc2020-building-a-production-level-data-pipeline-using-kedro

これも読んだ方が良い

sakamomo554101 commented 3 years ago

notebookのコードをどう持ってくるのか？とかはあるのかな？（Data ScientistとMLエンジニアのブリッジ的な機能になりそう）

sakamomo554101 commented 3 years ago

あと、Serving API（モデルのデプロイ）をする場合、自前で頑張るのか、ある程度クラウドに対応してるのか、あたりは気になるな。

sakamomo554101 commented 3 years ago

https://github.com/quantumblacklabs/kedro-community

kedroを知るために、communityの情報を漁っても良さそう。

sakamomo554101 commented 3 years ago

https://www.salesanalytics.co.jp/datascience/datascience008/

Dask（分散処理フレームワーク）について。 NumpyやPandas.DataframeのフォーマットをDaskに変換することが可能。（データフォーマットの演算処理をある程度分散でやって、メモリに乗るようにしてるっぽい）

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/03_set_up_data.html#register-the-datasets

KedroではDataCatalogを使うために、fssspec（各種のデータファイルライブラリ）を用いているっぽい。（だから、ローカルやリモートのデータロケーションや、各種データフォーマットに対応できている）

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/07_extend_kedro/03_custom_datasets.html

DataSetのカスタマイズについて

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/04_create_pipelines.html#persist-pre-processed-data

なるほど、Catalogに登録していないデータはMemoryDataSetとして保持される感じか。

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/05_data/02_kedro_io.html#versioning

versioningについては上記を見る必要がある

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/04_create_pipelines.html#kedro-runners

runnerについて。 Sequential, Parallel, Threadの3種類。カスタマイズしたRunnerを作ることも可能.

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/05_package_a_project.html#add-documentation-to-your-project

へぇ、build-docsでドキュメントも作ってくれるのか。

sakamomo554101 commented 3 years ago

試しに、ドキュメントをirisのサンプルプロジェクトで作ろうとしたら、下記のエラー

Configuration error:
設定ファイルにプログラム上のエラーがあります:

Traceback (most recent call last):
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sphinx/config.py", line 326, in eval_config_file
    execfile_(filename, namespace)
  File "/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sphinx/util/pycompat.py", line 88, in execfile_
    exec(code, _globals)
  File "/Users/shotasakamoto/development/python/test_kedro/test/docs/source/conf.py", line 53, in <module>
    from test import __version__ as release
ImportError: cannot import name '__version__' from 'test' (/Users/shotasakamoto/.pyenv/versions/3.8.3/lib/python3.8/test/__init__.py)

sakamomo554101 commented 3 years ago

あぁ、、、testというプロジェクト名にした結果、、python3.8配下にあるtestフォルダのinit.pyからversionを取ろうとしてるのか・・（多分、プロジェクト名を変えれば解決しそう）

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/05_package_a_project.html#package-your-project

ふむ、python packageにできるのは良いとして、メリットはなんだろ？（使う場合に、kedro newなどは不要みたいだが）

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/03_tutorial/06_visualise_pipeline.html#share-a-pipeline

jsonフォーマットでpipeline情報を可視化した結果を保存できるっぽい

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/01_dependencies.html#project-specific-dependencies

requirements.inを修正して、requirements.txtを作る感じ（pip-toolsを用いて、コンパイルする）

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html#configuration

Configuration周りはちゃんと読んだ方が良さそう

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html#local-and-base-configuration-environments

base, localのどちらのconfがオーバーライドされるかいうと、ロードタイミングに依存する（ConfigLoaderに渡っている順番次第）同じキーのものがオーバーライドされる

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html#additional-configuration-environments

別のconfiguration（例えば、server, testフォルダ）でオーバーライドも可能

sakamomo554101 commented 3 years ago

hook_implのデコレーターって、どういう効果だ？

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html#registering-your-hook-implementations-with-kedro

上記のようにhook_implを実装したものを登録すれば、自動でkedro側の処理で呼び出してくれる。（上記例は、catalogのロード処理の後の動作を追加している）

対応しているfunctionがいくつかリストアップされているはず

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html#execution-timeline-hooks

上記にリストアップされている（他にもありそう）

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html#template-configuration

へぇ、global_dictの内容が読み込んだconfigのkeyをオーバーライドする形になるのか。（ようはyamlに記載したkeyを読み取らないようにする場合、globalで定義してしまえば、パラメーターは変わらないようになる、ということ）

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html#use-parameters

nodeにparameterを渡す際、params:xxxx、みたいな書き方でピンポイントでパラメーターを渡すことも可能（一括して、parametersでdictで渡してもOK）

sakamomo554101 commented 3 years ago

https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html#credentials

credentialsもymlに書いて、commitしないようにね、という話