thephpleague / csv

CSV data manipulation made easy in PHP
https://csv.thephpleague.com
MIT License
3.34k stars 335 forks source link

Encoding formatters (all formatters?) are not applied to the header #528

Closed tiagof closed 4 months ago

tiagof commented 4 months ago

Bug Report

Importing a file encoded in ISO-8895-15 where both the header and rows have accented characters

Information Description
Version 9.15
PHP version 8.2
OS Platform macOS (Sonoma)

Summary

Importing a file with 1 header + 1 row (example), encoded in ISO-8895-15:

Café,Late
Exposé,Now

Standalone code, or other way to reproduce the problem

test.csv

$encoder = (new CharsetConverter())->inputEncoding('ISO-8895-15');
$csv = Reader::createFromPath('test.csv')
    ->addFormatter($encoder)
    ->skipEmptyRecords()
    ->setHeaderOffset(0);

$stmt = Statement::create();
$records = $stmt->process($csv);

foreach ($records as $row) {
    dump(array_keys($row), array_values($row));
}

Expected result

array:2 [
  0 => "Café"
  1 => "Late"
] 
array:2 [
  0 => "Exposé"
  1 => "Now"
]

Actual result

array:2 [
  0 => b"Café"
  1 => "Late"
] 
array:2 [
  0 => "Exposé"
  1 => "Now"
]

Notice the "b" before "Café" showing that is not properly encoded.

nyamsprod commented 4 months ago

@tiagof thanks for using the library. As explain in the documentation

Formatting happens AFTER combining the header and the fields value if a header is available and CSV value BUT BEFORE you can access the actual value.

Which means that at that point the header record is already calculated hence no formatting can be applied on it.

What you can/should do to workaround this expected behaviour is to change how your formatter is being added to the CSV document.

<?php

$csv = Reader::createFromPath('test.csv');
CharsetConverter::addTo($csv, 'ISO-8895-15'); //attach the CharsetConverter class as a stream filter

$csv
    ->skipEmptyRecords()
    ->setHeaderOffset(0);

foreach ($csv as $row) {
    dump(array_keys($row), array_values($row));
}

In this example, the CharsetConverter is used as a stream filter and stream filtering is applied before the CSV header is calculated which would result in ALL your CSV fields being converted.

Please refer to the documentation for further informations on the limitation of my proposal solution.

tiagof commented 4 months ago

@nyamsprod , many thanks! It works flawlessly! Honestly, I did go through the documentation, but apparently not thoroughly enough.

Cheers!