[Datasets] -> Adding meta during to_dask() call

Description

In the Ray Dataset implementation of to_dask(), there currently isn't a way to pass in a custom meta object during Dask DataFrame creation, nor is there an internal implementation currently to infer the meta via a Pyarrow-esque schema or something else like a Pandas DataFrame. This can cause problems in scenarios where the Dask Partitions have single rows with some NaNs in them, causing metadata mismatch issues.

Could we add this in? I have a custom implementation here: https://github.com/ludwig-ai/ludwig/pull/2318/files that works but I do believe there's a better way to do this. Happy to discuss!

Use case

Trying to make sure that Dask DataFrames are created from Ray with the right meta information so that downstream tasks don't throw metadata mismatch issues.

ray-project / ray

[Datasets] -> Adding meta during to_dask() call #27502

Description

Use case