paleolimbot / narrow

An R interface to the 'Apache Arrow' C API
https://paleolimbot.github.io/narrow/
Other
30 stars 3 forks source link

Exporting record batch readers from arrow package segfaults #2

Closed paleolimbot closed 2 years ago

paleolimbot commented 2 years ago

...probably a misunderstanding on my part about the object types that are expected:

library(carrow)

  # some test data
  df <- data.frame(a = 1L, b = 2, c = "three")
  batch <- arrow::record_batch(df)
  tf <- tempfile()

  # write a valid file
  file_obj <- arrow::FileOutputStream$create(tf)
  writer <- arrow::RecordBatchFileWriter$create(file_obj, batch$schema)
  writer$write(batch)
  writer$close()
  file_obj$close()

  # create the reader
  read_file_obj <- arrow::ReadableFile$create(tf)
  reader <- arrow::RecordBatchFileReader$create(read_file_obj)

  # export it to carrow
  stream <- as_carrow_array_stream(reader)

  schema <- carrow_array_stream_get_schema(stream)
  identical(
    carrow_schema_info(schema, recursive = TRUE),
    carrow_schema_info(as_carrow_schema(reader$schema), recursive = TRUE)
  )
#> [1] TRUE

  # skip("Attempt to read batch from exported RecordBatchReader segfaults")
  # batch <- carrow_array_stream_get_next(stream)

  read_file_obj$close()
  unlink(tf)

Created on 2021-11-23 by the reprex package (v2.0.1)

paleolimbot commented 2 years ago

Totally was a misunderstanding...a RecordBatchFileReader is not exportable. (A RecordBatchStreamReader is).