Consider CleverCSV style parsing to determine the CSV dialect

tidyverse / vroom

Fast reading of delimited files

https://vroom.r-lib.org

Other

621 stars 60 forks source link

Consider CleverCSV style parsing to determine the CSV dialect #105

Open jimhester opened 5 years ago

jimhester commented 5 years ago

This would likely give us better delimiter guessing results than the current method, and also let us guess things like the quote and escapes used.

https://github.com/alan-turing-institute/CleverCSV/
paper with details - https://arxiv.org/pdf/1811.11242.pdf

The main cost would be implementation time and how much data is needs to be read for it to work well.

ws-garcia commented 5 months ago

In order to overcome the amount of data needed to determine CSV dialects a method like this can be implemented. The methodology shows great accuracy reading only ten records from a CSV file.