leomarquine / php-etl

Extract, Transform and Load data using PHP.
MIT License
178 stars 81 forks source link

jscon_decode transformer is only decoding integers #16

Closed gilmojoa closed 5 years ago

gilmojoa commented 5 years ago

is the json_decode transformer working as expected to your knowledge? I have been trying to debug issue with using the json_decode transformer for days now and problem I experience is it can only decode integers in the json string. The rest of the values are coming out as NULL. I have verified i can call the standard json_decode on the file contents so its not that the encoding of the file is wrong.
just to re-iterate - i have tested the etl flow and when i am just using integer values for the mapped columns in my database it runs through without a problem, but when i try to pass string vales they are decoded as NULL.

leomarquine commented 5 years ago

The tests for the json decode transformer are passing and everything seems ok. Can you provide an example code that reproduces this error?

gilmojoa commented 5 years ago

I think the issue is that when it comes to the actual json_decode call in the etl process, it is calling json_decode on an individual string (value), i.e. this line in the JsonDecode.php - return json_decode($column, $this->assoc, $this->depth, $this->options); For me, when i pass the below file into the etl process, when I get to the JsonDecode, it is being called on an individual column value. is this expected behaviour? e.x. calling json_decode("Cassandra"); returns null. and for my file, (which I assume is in the correct format), the json_decode is called on each of the values from the key:value pairs in there.

candidate1app-1 .txt

if you see something wrong with my file format (this is just an example file i tested with) can you please point it out? or even better, can you send me a sample file you know works and i can compare to see where the issue is?

gilmojoa commented 5 years ago

a sample of my code as requested also:

$json = '[{"a": "fidplasy","b":"The Apache Software Foundation","processIdList":3,"displayName":"testingstring","e":5}]'; file_put_contents('/tmp/app-1.json', $json); $data = $etl->extract('json', '/tmp/app-1.json'); $data->transform('json_decode', $transformer_options) ->load('insert', config('iqcloud.application_components.tablename'), $options) ->run();

leomarquine commented 5 years ago

For me, when i pass the below file into the etl process, when I get to the JsonDecode, it is being called on an individual column value. is this expected behaviour?

Yes, this is the expected behaviour. You dont need to use the json decode transformer, since the json extractor already decodes your file.

gilmojoa commented 5 years ago

ok - so when would I need to use the json decode transformer then? your saying I should be able to skip this step in the etl process?

leomarquine commented 5 years ago

Yes, you can do something like:

$etl->extract('json', '/tmp/app-1.json')
    ->load('insert', config('iqcloud.application_components.tablename'), $options)
    ->run();

The load step will have access to the a, b, processIdList, displayName and e columns.

The json decode is useful when only specific columns of your source are json, not the entire source.