xanderdunn / SwiftArrow

Swift wrapper of the Apache Arrow library
1 stars 0 forks source link

Implement Arrow CSV Reader #5

Open xanderdunn opened 3 years ago

xanderdunn commented 3 years ago

GlibC Interface here. This is likely more performant than the pandas CSV reader and likely more easily parallelizable in Swift.

Python version of it is [here](), GlibC Interface here. This is likely more performant than the pandas CSV reader and likely more easily parallelizable in Swift.

xanderdunn commented 3 years ago

In the GLIBC interface I'm looking for garrow_memory_mapped_input_stream_new and GArrowBatchFileReader and GArrowBatchFileWriter.

xanderdunn commented 3 years ago

Doing this in Swift via Python causes segfaults and fatal errors:

    let tables: [PythonObject] = allPaths.parallelMap { path -> PythonObject in
        let table = pyarrowcsv.read_csv(path)
        progress.next()
        return table
    }

Hopefully the Swift implementation of pyarrow's read_csv would not have this problem.