uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
915 stars 251 forks source link

Parsing a CSV file with spaces and colon #452

Closed Phalaen closed 3 years ago

Phalaen commented 3 years ago

Good morning,

I'm trying to use CsvParser on a comma delimited file but I think there is a problem with managing the spaces and colon.

In particular, these are the first lines of the file:

Device serial,Date ,Temperature 51 (Medium) °C 11869,2021-02-09 00:14:59,7.2 11869,2021-02-09 00:30:01,7.1 11869,2021-02-09 00:44:59,7.2 11869,2021-02-09 00:59:59,7.4

..while this is the script (I am using kotlin):

val settings = CsvParserSettings() settings.getFormat().setLineSeparator("\n") settings.setNumberOfRowsToSkip(1) settings.setIgnoreLeadingWhitespaces(true) settings.setIgnoreTrailingWhitespaces(true)

Unfortunately only the first column of each row is correctly identified in the output, for example (first row): first column: 11869 second column: 2021-02-09 third colum: 0 No other colums detected.

Is it a bug or my mistake?

Thank you

dmsleptsov commented 3 years ago

Hi @Phalaen

In my case, all works are fine

        CsvParser parser = null;
        try {
            final var csv = (
                    "Device serial,Date ,Temperature 51 (Medium) °C\n"
                            + "11869,2021-02-09 00:14:59,7.2\n"
                            + "11869,2021-02-09 00:30:01,7.1\n"
                            + "11869,2021-02-09 00:44:59,7.2\n"
                            + "11869,2021-02-09 00:59:59,7.4"
            ).getBytes();

            final var settings = new CsvParserSettings();
            settings.getFormat().setLineSeparator("\n");
            settings.setNumberOfRowsToSkip(1);
            settings.setIgnoreLeadingWhitespaces(true);
            settings.setIgnoreTrailingWhitespaces(true);

            parser = new CsvParser(settings);
            parser.beginParsing(new ByteArrayInputStream(csv), StandardCharsets.UTF_8);

            Record record;
            while ((record = parser.parseNextRecord()) != null) {
                log.info("parse record: {}", record);
                final var deviceSerial = record.getString(0);
                final var date = record.getString(1);
                final var temperature = record.getString(2);
                log.info("deviceSerial: {}; date: {}; temperature:{}", deviceSerial, date, temperature);
            }
        } finally {
            if (Objects.nonNull(parser)) {
                parser.stopParsing();
            }
        }

image

Phalaen commented 3 years ago

Thank you @dmsleptsov , I report here my complete code:

     val settings = CsvParserSettings()
     settings.getFormat().setLineSeparator("\n")
     settings.setNumberOfRowsToSkip(1)
     settings.setIgnoreLeadingWhitespaces(true)
     settings.setIgnoreTrailingWhitespaces(true)

     val parser = CsvParser(settings)
     val allRows: List<Array<String>> = parser.parseAll(InputStreamReader(data, "UTF-8"))

     for (i in allRows.indices) {
         var DataStrings : Array<String> = allRows[i]
         var date : String =  DataStrings[1]

         list.add(date)
         }

The problem is that "date" contains only the first part of the string: 2021-02-09 instead of 2021-02-09 00:14:59

dmsleptsov commented 3 years ago

@Phalaen I was unable to reproduce your problem. It seems that the parser automatically detects the separator(or few separators: space and comma), although by default it is off and the default separator is ','

Could you please share your data source and which version are you using?

Phalaen commented 3 years ago

thank you @dmsleptsov, I am using version 'com.univocity', name: 'univocity-parsers', version: '2.9.1'. Here is an example of the data source: [hidden]

dmsleptsov commented 3 years ago

still works fine

        final var settings = new CsvParserSettings();
        settings.getFormat().setLineSeparator("\n");
        settings.setNumberOfRowsToSkip(1);
        settings.setIgnoreLeadingWhitespaces(true);
        settings.setIgnoreTrailingWhitespaces(true);

        final var parser = new CsvParser(settings);

        final var data = new FileInputStream("measurements_15_03_2021__12_30.csv");
        final var result = parser.parseAll(new InputStreamReader(data, StandardCharsets.UTF_8));
        result.forEach(strings -> {
            final var deviceSerial = strings[0];
            final var date = strings[1];
            final var temperature = strings[2];
            log.info("deviceSerial: {}; date: {}; temperature:{}", deviceSerial, date, temperature);
        });

image

Maybe you share your source code?

Phalaen commented 3 years ago

Solved converting the text into string before parsing, thank you