[core][data] Object store shared memory not getting cleared after using Ray Data

This is not a high severity issue, since a new job is still able to run fine, but it's a bit confusing to see on the dashboard, as no actors hold references to things in the object store anymore.

Repro script:

file_uri = "s3://air-example-data-2/100G-xgboost-data.parquet/d9ef953e9a7347db8793f9e772357e68_000888.parquet"
num_copies = 24_000

ds = ray.data.read_parquet([file_uri for i in range(num_copies)])

def train_fn(config):
    print("training started...")
    ds = ray.train.get_dataset_shard("train")
    for batch in ds.iter_batches(batch_size=32):
        print(batch)
        time.sleep(2)

trainer = TorchTrainer(train_fn, scaling_config=ray.train.ScalingConfig(num_workers=4), datasets={"train": ds})
trainer.fit()

See the object store memory usage after the job finishes at 17:15.

ray-project / ray

[core][data] Object store shared memory not getting cleared after using Ray Data #44610