Open merelcht opened 1 year ago
more inspiration for the topic
--async
/ --parallel
CachedDataSet
PartitionedDataSet
- Lazy loading/savingyield
node to process data in chunk - https://github.com/kedro-org/kedro/issues/2170pandas[performance]
https://github.com/joerick/pyinstrument
Not really a trick, but we should always do profiling to find out which bottlenecks to improve.
I'm very fond of https://github.com/benfred/py-spy as well, and I've heard good things about https://github.com/P403n1x87/austin
Memory profiling on the other hand has a clear winner these days https://github.com/bloomberg/memray, the original memory-profiler is sadly not maintained anymore
Description
Kedro offers multiple features that help with improving run performance, but people seem not to know about them or how to use them. Create a blogpost that showcases how to use e.g.
--async
andCachedDataSet
.Context
https://github.com/kedro-org/kedro/issues/2036#issuecomment-1460425776
Possible Implementation
Need to identify which features to showcase.