Open maliksMOJ opened 1 year ago
This issue won't be easily solved, as pandas and awswrangler only support chunking for line-delimited json files.
We could possibly use smart_open.open and readline? It might need some tricky parsing if the json records are across different lines.
arrow-pd-parser should support two different value types for the chunksize variable (string value denoting the memory allocation size i.e. 1GB or an integer value specifying how many rows to split by). However when specifying an integer value, the reader will only successfully split data from a JSONL file (line-delimited). I was unable to chunk when giving a comma-delimited JSON file.