osiegmar / FastCSV

CSV library for Java that is fast, RFC-compliant and dependency-free.
https://fastcsv.org/
MIT License
542 stars 93 forks source link

CSV data buffered after exception?! #115

Closed javafanboy closed 2 weeks ago

javafanboy commented 1 month ago

I am using a custom Writer class with a fixed size buffer throwing an exception when the buffer size is reached. This writer also supports mark/reset (as per how it works in i/o streams) and also rewind (reseting the write position of the buffer allowing it to be reused). I am writing to this Writer using FastCSV.

Before each writeRow call to FastCSV I call "mark" on my custom Writer and if an exception is thrown I perform a "reset", read the buffer (and send it over a communication link with a fixed max message size that I want to get as close to as possible), "rewind" the custom writer and re-drive the failed "writeRow" call and then continue writing more CSV records.

What I have observed is that the exception propagates through FastCSV as documented (UncheckedIOException), I will then as mentioned perform a reset, read and rewind on the underlying Writer but when I re-drive the failed writeRecord the CSV line is produced TWICE (once that seem to be buffered in the FastCSV Writer and once for the record I re-drive).

Not sure if this is a feature (assuming that FastCSV always buffers the WHOLE generated CSV record I can stop re-driving) but to me it seems more intuitive that FastCSV instead would clear its internal buffer if an exception is thown entering into a "clean state" allowing the client to decide if the data that caused the exception should be re-driven or for instance just logged and be discarded...

Right now I, as a work-around, I create a new FastCSV Writer each time the exception is thrown but this, even if it only happens every time a buffer is full, results in an unnecessary overhead.

osiegmar commented 2 weeks ago

I read this issue as well as your post at https://github.com/osiegmar/FastCSV/discussions/117 multiple times.

A running test case that demonstrates your use case, including the problem you are experiencing, would be very helpful.

As you pass a custom Writer instance to FastCSV and also operate directly on this writer instance, the bare minimum is to disable FastCSV's internal buffering:

CsvWriter.builder()
    .bufferSize(0)
    .build(yourWriterInstance);

Maybe this is already sufficient to solve your problem.

I have to admit that when I developed FastCSV, I did not consider the use case you are describing (continuing to write after an I/O exception occurred).

For demonstration purposes, I created a rolling file example at ExampleCsvWriterWithRollingFile.java that demonstrates how to create CSV files with a maximum number of records or a maximum file size. Although this example uses a very different approach, that may not be suitable for your use case.

Everything else would be speculation without a running test case.

javafanboy commented 2 weeks ago

For various reasons I decided to use a custom binary serialization / de-serization instead of CSV in my current project and no longer use FastCSV in my codebase so not sure I have the time to re-create the problem right away.

Not sure I can describe it better than in the report - if it is not clear what is wrong you can probably just close this issue as my use-case was indeed quite special and may not hit anybody else.

If I use FastCSV again or have some time between other things I may try to put together an example and then I can re-open the case.

On Fri, Aug 23, 2024, 15:05 Oliver Siegmar @.***> wrote:

I read this issue as well as your post at #117 https://github.com/osiegmar/FastCSV/discussions/117 multiple times.

A running test case that demonstrates your use case, including the problem you are experiencing, would be very helpful.

As you pass a custom Writer instance to FastCSV and also operate directly on this writer instance, the bare minimum is to disable FastCSV's internal buffering:

CsvWriter.builder() .bufferSize(0) .build(yourWriterInstance);

Maybe this is already sufficient to solve your problem.

I have to admit that when I developed FastCSV, I did not consider the use case you are describing (continuing to write after an I/O exception occurred).

For demonstration purposes, I created a rolling file example at ExampleCsvWriterWithRollingFile.java https://github.com/osiegmar/FastCSV/blob/9ea2e5861b58cacfc6018d4fde6153f9a726a275/example/src/main/java/example/ExampleCsvWriterWithRollingFile.java that demonstrates how to create CSV files with a maximum number of records or a maximum file size. Although this example uses a very different approach, that may not be suitable for your use case.

Everything else would be speculation without a running test case.

— Reply to this email directly, view it on GitHub https://github.com/osiegmar/FastCSV/issues/115#issuecomment-2307055717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXQF4JUF2VI3XCIAIMVO3ZS4XSBAVCNFSM6AAAAABK6XJGGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGA2TKNZRG4 . You are receiving this because you authored the thread.Message ID: @.***>

javafanboy commented 2 weeks ago

Thanks for the example - will have a look and perhaps I can use that approach next time I use FastCSV!

On Fri, Aug 23, 2024, 15:36 Javafanboy @.***> wrote:

For various reasons I decided to use a custom binary serialization / de-serization instead of CSV in my current project and no longer use FastCSV in my codebase so not sure I have the time to re-create the problem right away.

Not sure I can describe it better than in the report - if it is not clear what is wrong you can probably just close this issue as my use-case was indeed quite special and may not hit anybody else.

If I use FastCSV again or have some time between other things I may try to put together an example and then I can re-open the case.

On Fri, Aug 23, 2024, 15:05 Oliver Siegmar @.***> wrote:

I read this issue as well as your post at #117 https://github.com/osiegmar/FastCSV/discussions/117 multiple times.

A running test case that demonstrates your use case, including the problem you are experiencing, would be very helpful.

As you pass a custom Writer instance to FastCSV and also operate directly on this writer instance, the bare minimum is to disable FastCSV's internal buffering:

CsvWriter.builder() .bufferSize(0) .build(yourWriterInstance);

Maybe this is already sufficient to solve your problem.

I have to admit that when I developed FastCSV, I did not consider the use case you are describing (continuing to write after an I/O exception occurred).

For demonstration purposes, I created a rolling file example at ExampleCsvWriterWithRollingFile.java https://github.com/osiegmar/FastCSV/blob/9ea2e5861b58cacfc6018d4fde6153f9a726a275/example/src/main/java/example/ExampleCsvWriterWithRollingFile.java that demonstrates how to create CSV files with a maximum number of records or a maximum file size. Although this example uses a very different approach, that may not be suitable for your use case.

Everything else would be speculation without a running test case.

— Reply to this email directly, view it on GitHub https://github.com/osiegmar/FastCSV/issues/115#issuecomment-2307055717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXQF4JUF2VI3XCIAIMVO3ZS4XSBAVCNFSM6AAAAABK6XJGGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGA2TKNZRG4 . You are receiving this because you authored the thread.Message ID: @.***>