Closed aloysius-lim closed 7 months ago
Hey @aloysius-lim I am not able to reproduce this.
On latest master, I also cannot reproduce this. Please feel free to reopen with more details if you run into the issue after upgrading ray (I see you are on ray 2.5.1).
What happened + What you expected to happen
Given an SQL database (tested on SQLite and PostgreSQL via psycopg) When a Dataset is retrieved with
read_sql(..., parallelism=1)
And transformations are applied to the Dataset (e.g.map_batches()
,add_column()
) Then any operation that materializes the data (e.g.show()
,count()
,write_csv()
) fails withValueError: The size in bytes of the block must be known
Stacktrace:
Versions / Dependencies
My environment:
Databases tested:
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.