tidyverse / vroom

Fast reading of delimited files
https://vroom.r-lib.org
Other
622 stars 60 forks source link

vroom_lines unable to ignore quoted newlines in ISO-8859-1 csv #542

Open carlosresu opened 1 month ago

carlosresu commented 1 month ago

vroom::vroom_lines is unable to ignore quoted newlines when trying to count lines of a ISO-8859-1 encoded csv that contains quoted newlines.

library(data.table)

total_rows <- fread(full_claims_file(part), select = 1L, header = TRUE)[, .N]

print(paste("Total Rows via fread:", total_rows))

library(vroom)

# Function to count rows using vroom
count_rows_vroom <- function(file_path) {
  total_lines <- length(vroom_lines(file_path, altrep = TRUE, progress = FALSE))
  return(total_lines - 1L)  # subtract 1 for the header
}

# Use the function to count total rows
total_rows <- count_rows_vroom(full_claims_file(part))

print(paste("Total Rows via vroom_lines:", total_rows))

It counts the following: [1] "Total Rows via fread: 11777674" [1] "Total Rows via vroom_lines: 11801846"