thephpleague / csv

CSV data manipulation made easy in PHP
https://csv.thephpleague.com
MIT License
3.34k stars 336 forks source link

Docs: Fix sample code in documentation to use `getRecords()` instead of `getHeader()` #525

Closed meihei3 closed 5 months ago

meihei3 commented 5 months ago

The validation for duplicated headers requires calling Reader::computeHeader(), but Reader::getHeader() does not invoke this method. As a result, the current sample code does not produce the expected SyntaxError.

I have updated the sample code to use getRecords(), which correctly triggers the SyntaxError for duplicate headers.

Thanks.

nyamsprod commented 5 months ago

@meihei3 The code you are trying to update is related to the header and is correct AFAIK. getHeader will throw an exception if the header offset value set using setHeaderOffset is invalid because the package does a lazy evaluation of the header offset, the offset is really calculated once getHeader is called or when getRecords is used.

meihei3 commented 5 months ago

@nyamsprod Thank you for your feedback. I understand that getHeader will throw an exception if the header offset value set using setHeaderOffset is invalid due to lazy evaluation. However, the issue I encountered is specific to duplicated header entries.

While getHeader does call setHeaderOffset, setHeaderOffset does not call SyntaxError::dueToDuplicateHeaderColumnNames($header). This can be seen in the relevant code here: Reader.php#L76-L90 and Reader.php#L562.

As a result, the current sample code does not produce the expected SyntaxError for duplicated headers. In contrast, getRecords() invokes the necessary method to detect duplicate columns and correctly trigger the SyntaxError.

Thanks.

nyamsprod commented 5 months ago

@meihei3 the documentation is right but the implementation has a bug!! So I will convert your ticket into a bug issue that I will fix ASAP!! So the code needs to be changed not the documentation

nyamsprod commented 5 months ago

Now that I am thinking a bit about it you may be right. When you choose a row to be the header, if that row exists it should be returned as is. You did select the row afterall. What is triggering the error is when you try to use that row as a header. So should the exception be thrown when you still have not decided to use the header or when you are in fact using the error. If it's the former then it is a bug in the code ... if its the latter then it is a bug in the documentation 🤷🏾

meihei3 commented 5 months ago

I think that getHeader() does not need to call SyntaxError::dueToDuplicateHeaderColumnNames(). According to RFC 4180, duplicate header fields are not explicitly prohibited.

However, getRecords() converts CSV rows into key-value PHP arrays. In this context, it is logical for SyntaxError::dueToDuplicateHeaderColumnNames() to be triggered during the conversion from CSV to PHP objects.

nyamsprod commented 5 months ago

Then let's merge your PR and be done with it 👍🏾

meihei3 commented 5 months ago

@nyamsprod Thank you! However, I don't have the permissions to merge this PR, so could you please merge it on my behalf?

nyamsprod commented 5 months ago

merged and thanks for the fix in documentation ... and for using the library.