NamedCsvReader should trim header fields when it reads first line

osiegmar / FastCSV

CSV library for Java that is fast, RFC-compliant and dependency-free.

https://fastcsv.org/

MIT License

542 stars 93 forks source link

NamedCsvReader should trim header fields when it reads first line #90

Closed azharrnaeem closed 9 months ago

azharrnaeem commented 11 months ago

Is your feature request related to a problem? Please describe. If the header fields contains space after delimiter i.e , then the fields are read with space prefix. For example if the header line is column1, column2, column3 then the map de.siegmar.fastcsv.reader.NamedCsvRow#getFields, contains keys with spaces prefixed.

Describe the solution you'd like While reading header line, the values can trimmed. Perhaps a new feature flag in the builder would be a better option to enable/disable it.

Describe alternatives you've considered None.

RFC 4180 compliance To best of my knowledge, it should not contradict compliance.

osiegmar commented 10 months ago

Per section 2.4 of RFC 4180:

Spaces are considered part of a field and should not be ignored.

Anyway, I'm still considering trim support but I'm not so sure if header records need special treatment.

illdd1 commented 10 months ago

How could we deal with white space at the end of a file when using named csv reader , it is picking up the white space at the end of the file When looking for a value with white space at the end it cannot find it

Example Looking for (“name”) cannot find (“name “)

@osiegmar

osiegmar commented 10 months ago

@illdd1 I'm not sure if I understood you correctly / how your post differs from what @azharrnaeem requests.

Currently FastCSV behaves exactly RFC conform. If you have a column header "name " you have to access it via getField("name "). In addition to that, all recognized header fields (including their spaces) are included in getFields().

Strictly speaking, your CSV file is broken. But I understand that people have to treat with those files. I just need to make sure that changes (like trim fields, case-insensitive lookups, duplicate header fields, ...) do not sacrifice the high performance of FastCSV – which is the number 1 design goal of this library!

osiegmar commented 9 months ago

This will be possible with FastCSV 3 (soon to be released).

To simply trim all fields you can just call:

CsvReader.builder().fieldModifier(FieldModifier.TRIM).build(data);

For special treatment you could implement a custom FieldModifier:

// Call .trim() and .toUpperCase() for the first line only
FieldModifier headerTrimUpperModifier = (originalLineNumber, fieldIdx, comment, quoted, field) ->
    originalLineNumber == 1 ? field.trim().toUpperCase() : field;

var csvBuilder = NamedCsvReader.builder()
    .fieldModifier(headerTrimUpperModifier);

for (NamedCsvRecord csvRecord : csvBuilder.build(" h1 , h2 \nfoo,bar")) {
    System.out.println(csvRecord.getFieldsAsMap());
}

prints: {H1=foo, H2=bar}

(Syntax may change until the official release!)