keboola / php-component

General library for PHP applications running in Keboola Connection environment
MIT License
0 stars 1 forks source link

Create manifest with specified values only. #17

Closed ujovlado closed 6 years ago

ujovlado commented 6 years ago

Use case:

  1. Import table from some component (extractor) to storage with incremental: true
  2. Set PK to that table manually (e.g. in storage console)
  3. Run extractor from step 1. -> it fails, because manifest contains primary key set to []

Current behavior

$manifestManager->writeTableManifestFromArray(
        $outputPath . '/' .$webalizedExportName . '.csv', [
             'incremental' => $exportOptions['incremental'],
        ]);

will create:

{"destination":"","primary_key":[],"delimiter":",","enclosure":"\"","columns":[],"incremental":true,"metadata":[],"column_metadata":[]}\n

Expected behavior

It should create this imho.

{"incremental":true}\n

Also, with current approach, imports to storage are dependent on php-component library. What if we decide to change one of the default values?

ujovlado commented 6 years ago

But maybe I'm wrong and current behavior is fine.

tomasfejfar commented 6 years ago

Expected workflow is (as seen here):

$manifest = $manager->getTableManifest('my-table');
$manifest['incremental'] = true;
$manager->writeTableManifestFromArray($fullPath, $manifest);

But I understand the problem. If the original manifest does not have all the required fields set, then it prefills them with default values. I agree that this behavior is not ideal, unneccesarily couples php-component implementation to APIs default values and manifest manager should only write fields that are present. Unfortunately, that causes BC break again.

Note to self: Use 0.x.y version for the library before it's been used a few times to denote it is unstable. Instead of major release every week 😩

ujovlado commented 6 years ago

I'm not 100% sure here. Maybe my extractor is wrong if it supports incremental without defining primary_key.

@odinuv, what do you think?

tomasfejfar commented 6 years ago

It could be the case, but still, it should not write anything more than you actually ask for.

pivnicek commented 6 years ago

@ujovlado primaryKey is not required for incremental. If PK exists, rows will be updated, if not, they will be appended

odinuv commented 6 years ago

yep, as @pivnicek says, in that case setting primaryKey: [] is plain wrong, it should not be there at all (meaning - don't touch my primary key), incremental could still work then

tomasfejfar commented 6 years ago

So, can we agree on a solution that:


On different note, as I'll inevitably have to raise the major version again, I thought about making another BC break and add a generic "configuration" class (name is up to discussion), that will be approximately what Config is now, minus the getters. It's primary concern would be getting the values from a configuration file with optional defaults. I already had to handle the case of missing config keys by hand and I they keys were flat. It would be much more annoying if the keys were a part of hierarchy. Also, in other cases it would be useful to have the option to supply default value. What do you think?

ujovlado commented 6 years ago

default values in the per-property method writeTableManifest will be null, meaning "does not appear in manifest"

What about to deprecate this method? If there will be nulls we'll end up with method like constructor in Symfony Process. This is what you need to do if you want to override 60s timeout:

new Process('ls -la', null, null, null, null); // last argument is timeout

In our case, setting incremental flag only will look like:

$mm->writeTableManifest('some file', null, null, null, true);

I agree with rest, only not sure about this.

tomasfejfar commented 6 years ago

The only problem that array-driven API sucks IMHO. You can't typehint, you need to have docs open, you can make a typo, etc.

What might be the solution, though... is the Options object as is used in the API. I had a great experience writing KBC tests. I never had to open the docs, creating the configuration for API call using the Option objects and code completion. That could be the case here as well. With the optional fallback to the array.

$mm->writeTableManifest(
    (new Options\WriteTableManifest())
        ->setIncremental(true);
);
$mm->writeTableManifest([
    'incremental' => true
]);

Update: I would even think that the array fallback is not needed, unless we want to minimize the BC break.

ujovlado commented 6 years ago

At first I wanted to create issue that I want setters in Manifest Manager :). Then I saw that it can be used to create more manifest files with one instance. So I'll be more than happy with proposed solution with Options object.