makamaka / Text-CSV

comma-separated values manipulator
24 stars 19 forks source link

FR: Reading multiple CSVs from a single file #62

Open sciurius opened 1 year ago

sciurius commented 1 year ago

Occasionally I have to process files that contain multiple CSVs. Each of these CSVs is stored in the file as a heading, followed by the data lines, followed by an empty line (or eof).

It would be nice if Text::CSV had an option that basically says: stop reading after an empty line. This would make it possible to write something similar to:

csv (in => $fh,  out => \@aoh1, stop_at_empty => 1);
csv (in => $fh,  out => \@aoh2, stop_at_empty => 1);
csv (in => $fh,  out => \@aoh3, stop_at_empty => 1);
Tux commented 1 year ago

eval { csv (in => $fh, out => \@aoh, bom => 1, strict => 1) } might be a workable alternative. As for the name of this new feature, empty_row_is_eof sounds more logical. Alternatively I can imagine new values for the existing skip_empty_rows where 1 is skip and 2 is stop (eof). One could even go further and implement a callback for this attribute:

empty_row => undef,  # identical to skip_empty_rows = 0: default
empty_row => "skip", # identical to skip_empty_rows = 1
empty_row => "eof",  # stop parsing, no error
empty_row => "fail", # stop parsing, FAIL
empty_row => \&foo,  # call foo on empty_rows (see on_in)
sciurius commented 1 year ago

Allow a callback for in that returns the next line from the input file, or undef on eof?

Tux commented 1 year ago

A callbach for in already exists, but that is defined (as \@foo) to be expected to return an arrayref. I just note this is not completely documented, but used in this example code: https://github.com/Tux/Text-CSV_XS/blob/master/doc/CSV_XS.md#dumping-database-tables-to-csv

# using the csv function, streaming with callbacks
my $sth = $dbh->prepare ($sql); $sth->execute;
csv (out => "foo.csv", in => sub { $sth->fetch            });
csv (out => "foo.csv", in => sub { $sth->fetchrow_hashref });
sciurius commented 1 year ago
skip_empty_rows => 0, or undef,  # identical to skip_empty_rows = 0: default
skip_empty_rows => 1 or "skip", # identical to skip_empty_rows = 1
skip_empty_rows => "eof",       # stop parsing, no error
skip_empty_rows => "fail",      # stop parsing, FAIL
skip_empty_rows => \&foo,       # call foo on empty_rows (see on_in)