leomarquine / php-etl

Extract, Transform and Load data using PHP.
MIT License
178 stars 81 forks source link

Duplicates on insert with multiple keys lookup? #8

Closed lolaslade closed 5 years ago

lolaslade commented 6 years ago

Thanks for the library. It is making my code very readable! I am using v1.1. I can't use the latest master due to PHP version constraints just yet. With the code below I am getting duplicates after the job runs multiple times.

// Make items unique
$items = array_intersect_key($items, array_unique(array_map("serialize", $items)));

$loadOptions = [
  'keys' => ['account_id', 'project_id'],
  'insert' => true,
  'update' => true,
  'delete' => false, // TODO Soft delete
];
$loaderJob = new Job();
$loaderJob->extract('ArrayData', $items);
$loaderJob->load('Table', 'account_project', $loadOptions);
lolaslade commented 6 years ago

The issue was in the Indexable trait. The problem was that array_intersect_key was not preserving the order of the keys so that $old was indexed in one order and $items in another. Since you are no longer using traits I guess this should just be closed but I will post the solution in case anyone else is using this.

https://github.com/leomarquine/php-etl/blob/4c3cbd2aed8fd170ee8cdf489335a66e26799d9d/src/Traits/Indexable.php#L21

This is a fix:

        $keyParts = array_intersect_key($item, $keys);
        asort($keyParts);
        $key = implode('', $keyParts);
leomarquine commented 6 years ago

I'm not using this trait anymore on master branch, but master is currently a work in progress of v2. I'll take a look at this bug and in your features requests.