ray-project / ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
Apache License 2.0
344 stars 65 forks source link

What's the meaning on these senstence of "Part 5: Distributed batch inference with Ray Core API" #77

Closed luxunxiansheng closed 1 year ago

luxunxiansheng commented 1 year ago

When using Ray, you can pass objects as arguments to remote functions. Ray will automatically store these objects in the local object store (on the worker node where the function is running) using the ray.put() function. This makes the objects available to all local tasks. However, if the objects are large, this can be inefficient as the objects will need to be copied every time they are passed to a remote function.

To improve performance, you can explicitly store both the model and feature extractor in the object store by using ray.put(). This avoids the need to create multiple copies of the objects.


I am confused on the words on : ray.put() 1) "However, if the objects are large, this can be inefficient as the objects will need to be copied every time they are passed to a remote function " 2) "To improve performance, you can explicitly store both the model and feature extractor in the object store by using ray.put(). This avoids the need to create multiple copies of the objects."

which sentence should I follow ?

kamil-kaczmarek commented 1 year ago

@luxunxiansheng

If you call a remote function - that uses large object - multiple times it's best practice to store these objects in the object store. Then use reference in the function call. Have a look at the Ray core best practices: Anti-pattern: Passing the same large argument by value repeatedly harms performance - section at the bottom.