Is your feature request related to a problem? Please describe.
We needed to adjust one of the interop tests yesterday because the new C++-based schema creation and writing can miss an 'automagic' cast we get otherwise.
This is because schema creation, and writes, can be separate. The schema clearly defines the layout. But the write can be more ad-hoc as it was in this test. A data.frame was create, and integer values were passes as is commonly done via an expression such as c(10, 20, 30, 42). But to R these a numeric aka double types. The are commonly cast internally but in this case the column was (per the schema) an int one yet the values, ontained via arrow::as_table(dataframeobject) now ad-hoc inferred a new schema (just for this data.frame-to-arrow conversion) based on the payload. So that column became double.
We could request that users do what we did in the test: as.integer(c(10, 20, 30, 42)). But that may not be realistic. R users just don't expect to have to do this. Our signature just says 'arrow table' so it can well be an ad-hoc conversion.
Describe the solution you'd like
The C++ layer may need to inject a casting step.
Describe alternatives you've considered
Forcing user to be more explicit. Doable ... but maybe not realistic / user-friendly?
Is your feature request related to a problem? Please describe.
We needed to adjust one of the interop tests yesterday because the new C++-based schema creation and writing can miss an 'automagic' cast we get otherwise.
This is because schema creation, and writes, can be separate. The schema clearly defines the layout. But the write can be more ad-hoc as it was in this test. A data.frame was create, and integer values were passes as is commonly done via an expression such as
c(10, 20, 30, 42)
. But to R these a numeric akadouble
types. The are commonly cast internally but in this case the column was (per the schema) anint
one yet the values, ontained viaarrow::as_table(dataframeobject)
now ad-hoc inferred a new schema (just for this data.frame-to-arrow conversion) based on the payload. So that column becamedouble
.We could request that users do what we did in the test:
as.integer(c(10, 20, 30, 42))
. But that may not be realistic. R users just don't expect to have to do this. Our signature just says 'arrow table' so it can well be an ad-hoc conversion.Describe the solution you'd like
The C++ layer may need to inject a casting step.
Describe alternatives you've considered
Forcing user to be more explicit. Doable ... but maybe not realistic / user-friendly?
Additional context
See this commit.