thephpleague / csv

CSV data manipulation made easy in PHP
https://csv.thephpleague.com
MIT License
3.34k stars 336 forks source link

The method chunkBy is not yielding chunks correctly #524

Closed verzola closed 6 months ago

verzola commented 6 months ago

Bug Report

Information Description
Version 9.15.0
PHP version 8.1
OS Platform Ubuntu (WSL)

Summary

The recently added chunkBy method is not chunking the CSV correctly like it is described on the documentation:

If you are dealing with a large CSV and you want it to be split in smaller sizes for better handling you can use the chunkBy method which breaks the TabularDataReader into multiple, smaller instances with a given size. The last instance may contain fewer records because of the chunk size you have chosen.

Instead of creating chunks of the size passed to the chunkBy method, it only creates 2 chunks, the first one with the correct chunk size and the second and last with the total csv lines count.

It seems that this issue might be caused by the changes of this commit: https://github.com/thephpleague/csv/commit/51968b6352abb1e833981d9c9f53393b2a520d6c#diff-63e150e70c3f2253f8ab94c5a8fe06190bbfee264c842f2c7fb51f9920dd0f2eR170

I tested the first version of the chunkBy method from this commit and it is working as expected: https://github.com/thephpleague/csv/commit/60b00624d5dff2838b55ba50a4a577401b8d75fc#diff-63e150e70c3f2253f8ab94c5a8fe06190bbfee264c842f2c7fb51f9920dd0f2eR170

If I add these 2 lines after the first yield in chunkBy method on the released 9.15.0 version, it also works as expected:

$nbRecords = 0;
$records = [];

Standalone code, or other way to reproduce the problem

I created a repository for this example: https://github.com/verzola/league-csv-chunkby-bug

<?php

use League\Csv\Reader;

require_once __DIR__ . '/vendor/autoload.php';

// data.csv is an example csv with 6000 lines
$reader = Reader::createFromPath(__DIR__ . '/data.csv');

$chunks = $reader->chunkBy(1000);

foreach ($chunks as $chunk) {
  echo count($chunk) . PHP_EOL;
}

Expected result

The expected output is: 1000 1000 1000 1000 1000 1000

Actual result

The actual output is: 1000 6000

Checks before submitting

nyamsprod commented 6 months ago

thanks for reporting the issue it has been fixed and will be part of the next release.

nyamsprod commented 5 months ago

the fix is now released in version 9.16.0