Write a blog post to showcase ways of improving Kedro run performance

kedro-org / kedro-devrel

Kedro developer relations team use this for content creation ideation and execution

Apache License 2.0

0 stars 3 forks source link

Write a blog post to showcase ways of improving Kedro run performance #49

Open merelcht opened 1 year ago

merelcht commented 1 year ago

Description

Kedro offers multiple features that help with improving run performance, but people seem not to know about them or how to use them. Create a blogpost that showcases how to use e.g. --async and CachedDataSet.

Context

https://github.com/kedro-org/kedro/issues/2036#issuecomment-1460425776

Possible Implementation

Need to identify which features to showcase.

noklam commented 1 year ago

more inspiration for the topic

--async / --parallel
CachedDataSet
PartitionedDataSet - Lazy loading/saving
yield node to process data in chunk - https://github.com/kedro-org/kedro/issues/2170
pandas[performance]

noklam commented 11 months ago

https://github.com/joerick/pyinstrument

Not really a trick, but we should always do profiling to find out which bottlenecks to improve.

astrojuanlu commented 11 months ago

I'm very fond of https://github.com/benfred/py-spy as well, and I've heard good things about https://github.com/P403n1x87/austin

Memory profiling on the other hand has a clear winner these days https://github.com/bloomberg/memray, the original memory-profiler is sadly not maintained anymore