Open bobhairgrove opened 2 years ago
I suppose that running libcsv in a single-threaded process would imply that if an exception were thrown in the callback function, execution would exit the csv_parse()
function and continue in the catch
block, so one wouldn't have to worry about having the callback function called afterwards.
However, since not all applications use the C++ exception mechanism, it would be useful to have such an abort()
function.
Another way of implementing this as a quick-and-dirty hack would be to (mis)use the malloc_func
member of the csv_parser struct which is not used anywhere else in the library. However, the code in the csv_parse()
function would still have to be changed to check that.
An additional error code, perhaps CSV_EABORTED, would need to be added which would be set in the status
member of the parser struct when the csv_parse
function discovers the new option.
I went ahead and implemented this. Here's a patch for what I did:
diff -u ./csv.h ../PATCH_for_abort_flag/csv.h
--- ./csv.h 2021-08-20 16:36:46.000000000 +0200
+++ ../PATCH_for_abort_flag/csv.h 2022-08-16 11:40:15.827490000 +0200
@@ -31,22 +31,25 @@
#define CSV_RELEASE 3
/* Error Codes */
-#define CSV_SUCCESS 0
-#define CSV_EPARSE 1 /* Parse error in strict mode */
-#define CSV_ENOMEM 2 /* Out of memory while increasing buffer size */
-#define CSV_ETOOBIG 3 /* Buffer larger than SIZE_MAX needed */
-#define CSV_EINVALID 4 /* Invalid code,should never be received from csv_error*/
+#define CSV_SUCCESS 0
+#define CSV_EPARSE 1 /* Parse error in strict mode */
+#define CSV_ENOMEM 2 /* Out of memory while increasing buffer size */
+#define CSV_ETOOBIG 3 /* Buffer larger than SIZE_MAX needed */
+#define CSV_EABORTED 4 /* Parsing was aborted */
+#define CSV_EINVALID 5 /* Invalid code,should never be received from csv_error*/
/* parser options */
-#define CSV_STRICT 1 /* enable strict mode */
-#define CSV_REPALL_NL 2 /* report all unquoted carriage returns and linefeeds */
-#define CSV_STRICT_FINI 4 /* causes csv_fini to return CSV_EPARSE if last
- field is quoted and doesn't containg ending
- quote */
-#define CSV_APPEND_NULL 8 /* Ensure that all fields are null-terminated */
+#define CSV_STRICT 1 /* enable strict mode */
+#define CSV_REPALL_NL 2 /* report all unquoted carriage returns and linefeeds */
+#define CSV_STRICT_FINI 4 /* causes csv_fini to return CSV_EPARSE if last
+ field is quoted and doesn't containg ending
+ quote */
+#define CSV_APPEND_NULL 8 /* Ensure that all fields are null-terminated */
#define CSV_EMPTY_IS_NULL 16 /* Pass null pointer to cb1 function when
empty, unquoted fields are encountered */
+#define CSV_ABORT 32 /* Flag which is checked in the csv_parse() function
+ with each iteration of the main loop. */
/* Character values */
diff -u ./libcsv.c ../PATCH_for_abort_flag/libcsv.c
--- ./libcsv.c 2021-08-20 16:36:46.000000000 +0200
+++ ../PATCH_for_abort_flag/libcsv.c 2022-08-16 11:54:44.455013000 +0200
@@ -74,7 +74,8 @@
"error parsing data while strict checking enabled",
"memory exhausted while increasing buffer size",
"data size too large",
- "invalid status code"};
+ "parsing aborted",
+ "invalid status code" };
int
csv_error(const struct csv_parser *p)
@@ -164,6 +165,8 @@
{
if (p == NULL)
return -1;
+ if (p->status == CSV_EABORTED)
+ return -1;
/* Finalize parsing. Needed, for example, when file does not end in a newline */
int quoted = p->quoted;
@@ -283,7 +286,8 @@
{
if (p == NULL) return 0;
if (p->realloc_func == NULL) return 0;
-
+ if (p->status == CSV_EABORTED) return 0;
+
/* Increase the size of the entry buffer. Attempt to increase size by
* p->blk_size, if this is larger than SIZE_MAX try to increase current
* buffer size to SIZE_MAX. If allocation fails, try to allocate halve
@@ -346,6 +350,13 @@
}
while (pos < len) {
+ /* Check the abort flag: */
+ if (p->options & CSV_ABORT) {
+ p->status = CSV_EABORTED;
+ p->quoted = quoted, p->pstate = pstate, p->spaces = spaces, p->entry_pos = entry_pos;
+ return pos;
+ }
+
/* Check memory usage, increase buffer if necessary */
if (entry_pos == ((p->options & CSV_APPEND_NULL) ? p->entry_size - 1 : p->entry_size) ) {
if (csv_increase_buffer(p) != 0) {
Oops ... didn't mean to close this right now!
It looks like there isn't any way to stop csv_parse() from running all the way to the end of the data. I am doing input validation in my "notify field" callback function.
If certain errors occur (not errors which would cause
csv_parse()
to stop anyway, but data validation errors -- such as regular expression matching fails), it would be nice to set some kind of "abort" flag in thecsv_parser->options
struct member which would be checked within the main parsing loop and return from the function if set (after cleaning up memory allocations, etc.). Since I still have access to the parser struct during processing, I could simply set the additional (to be determined) "CSV_ABORT" flag in the options. Anything else done to the parser struct would probably not work, or end up being very messy, I think.This would be very useful if the calling code throws a C++ exception, for example -- throwing an exception would not prevent
csv_parse()
from doing its thing until it runs out of data.Another use case would be for parsing 1st line field headers. Since the headers are only meaningful to the application using libcsv, and can be missing, any field could theoretically contain embedded newline characters (although they probably shouldn't). libcsv would be able to parse these, and when the first real end-of-line is reached, one might want to stop parsing. Otherwise, using
fgets
, etc. to look for a new line is bound to fail if any of the headers have such embedded new lines.