ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.75k stars 5.74k forks source link

[Feature] large data storage in Ray object store #21251

Closed chuckhope closed 2 years ago

chuckhope commented 2 years ago

Search before asking

Description

Hi, I have utilized ray tune to do the knowledge distillation work with huggingface transformers. Before training ,I have to use the SquadV1Prcessor of huggingface to extract features. I have a dataset of about 400MB, and it turns to 80GB (the virtural memory in the 'htop' UI).

The solutions I have tried:

  1. I have tried wrap the data in the trainable function >>> ValueError: The actor ImplicitFunc is too large > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB
  2. put my processed features into tune.with_parameters instead >>>. However, the program gets stuck without info. I can see from 'htop' UI that the program is still running but with a 80GB VIRT.
  3. put my processed features using ray.put and ray.get. >>> basically, I guess "ray.put" do the same thing with "tune.with_parameters"

Do you have any ideas my help, thank you!

Use case

ray tune to store a data ref larger than 80GB

Related issues

No response

Are you willing to submit a PR?

scv119 commented 2 years ago

That seems a lot of data. Can you try run ray memory --stats-only to see the object store memory usage?

chuckhope commented 2 years ago

@scv119 Thanks for your response. I got the message"Plasma memory usage 20305MiB, 10 objects, 55.53% full, 55.53% needed Objects consumed by Ray tasks: 20305 MiB" and the program just got stuck without info.

stale[bot] commented 2 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 2 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!