osiegmar / FastCSV

CSV library for Java that is fast, RFC-compliant and dependency-free.
https://fastcsv.org/
MIT License
542 stars 93 forks source link

Empty fields at the end are omitted #111

Closed kasim-ba closed 6 months ago

kasim-ba commented 6 months ago

When parsing a csv all the empty fields at the end are omitted. i.E. 123;asd;;;; (the field separator in my case is '\t') The csvRow will only have 2 elements. The list of fields should be constant, especially since in those cases it's hard to know if the csv is corrupt or if the empty data has been cut of.

osiegmar commented 6 months ago

Please provide the obligatory test case to proof that.

kasim-ba commented 6 months ago

Hello, recreating the behavior is quite simple. I created the following JUnit test, which shows that the first row misses the last three elements.

    @Test
    void testCsvReaderOmitsEmptyElements() {
        var data = """
one two three           
one two three four five six
one                     
one two                             
                """;
        try(CsvReader reader = CsvReader.builder().fieldSeparator('\t').build(data)) {
            reader.forEach(row -> assertThat(row.getFieldCount()).isEqualTo(6));
        } catch (IOException e) {
        }
    }

Edit: I also created a second test that shows that the problem is connected to tab character, since the following test is working perfectly fine:

    @Test
    void testCsvReaderFieldCountCorrect() {
        var data = """
one;two;three;;;
one;two;three;four;five;six
one;;;;;
one;two;;;;
                """;
        try(CsvReader reader = CsvReader.builder().fieldSeparator(';').build(data)) {
            reader.forEach(row -> assertThat(row.getFieldCount()).isEqualTo(6));
        } catch (IOException e) {
        }
    }
osiegmar commented 6 months ago

Your IDE is probably simply trimming whitespaces at end of lines. Try with explicit \t.

kasim-ba commented 6 months ago

Yes, that seems to be the case, at least to some extent. I'm working currently with STS 4.21. And after adding explicitly \t the test went through. It doesn't help to add the data into a CSV file. The same problems occur. But what did work is a single line string. And now I'm wondering if it's an IDE problem or if the compiler is doing this. Do you use a different IDE (i.e. IntelliJ)?

osiegmar commented 6 months ago

IDEs also remove trailing whitespaces from files, if configured. It's definitely nothing the compiler does.

Maybe this helps: https://stackoverflow.com/a/2618521

kasim-ba commented 6 months ago

Well, adding those settings did not really help with my problem, but there is definitely some problem with eclipse. The other problem that I don't understand is why this problem prevails with file. So using a CSV file, gets me the same error.

osiegmar commented 6 months ago

If the problem prevails with a file, the file has limes trimmed. I cannot reproduce - this just works fine:

import static org.assertj.core.api.Assertions.assertThat;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;

import de.siegmar.fastcsv.reader.CsvReader;
import testutil.CsvRecordAssert;

public class TabTest {

    @TempDir
    private Path tempDir;

    @Test
    void test() throws IOException {
        var data = """
        one\ttwo\tthree\t\t\t
        one\ttwo\tthree\tfour\tfive\tsix
        one\t\t\t\t\t
        one\ttwo\t\t\t\t
        """;

        var tempFile = tempDir.resolve("foo.csv");

        Files.writeString(tempFile, data);

        var stream = CsvReader.builder().fieldSeparator('\t').ofCsvRecord(tempFile).stream();

        assertThat(stream).satisfiesExactly(
            rec -> CsvRecordAssert.assertThat(rec).fields()
                .hasSize(6).containsExactly("one", "two", "three", "", "", ""),
            rec -> CsvRecordAssert.assertThat(rec).fields()
                .hasSize(6).containsExactly("one", "two", "three", "four", "five", "six"),
            rec -> CsvRecordAssert.assertThat(rec).fields()
                .hasSize(6).containsExactly("one", "", "", "", "", ""),
            rec -> CsvRecordAssert.assertThat(rec).fields()
                .hasSize(6).containsExactly("one", "two", "", "", "", "")
        );
    }

}
kasim-ba commented 6 months ago

Strange eclipse. Anyway, thanks for your support.