leomarquine / php-etl

Extract, Transform and Load data using PHP.
MIT License
178 stars 81 forks source link

Extracting from multi-level JSON objects #4

Closed stevevance closed 7 years ago

stevevance commented 7 years ago

I am extracting data where there is more than one level of objects with properties. In PHP this would be a multi-dimension array.

Sample data:

{
  "id": "ocd-bill/ab2cc592-ebbb-4350-aecc-215f9423c39f",
  "created_at": "2017-03-31T03:38:59.726143+00:00",
  "debug": null,
  "updated_at": "2017-04-27T00:34:25.132760+00:00",
  "abstracts": [
  ],
  "from_organization": {
    "name": "Chicago City Council",
    "id": "ocd-organization/ef168607-9135-4177-ad8e-c1f7a4806c3a"
  },
  "sponsorships": [
    {
      "entity_name": "Emanuel, Rahm",
      "entity_type": "person",
      "classification": "Primary",
      "primary": true,
      "entity_id": "ocd-person/f649753d-081d-4f22-8dcf-3af71de0e6ca"
    }
  ],
  "classification": [
    "ordinance"
  ],
  "subject": [
    "Sale",
    "Loan & Security"
  ],
  "identifier": "O 2017-3049",
  "other_titles": [

  ],
  "title": "Sale of City-owned property and multi-family loan agreement with New West Englewood Homes LLC at 2101-2111 W 63rd St",
  "related_bills": [

  ]
}
screenshot 2017-07-31 22 29 01

When I extract the sponsorships array, I want to preserve it as an array and load it into a JSON type field in a PostgreSQL database.

To do that, I added this code to /src/Loaders/Table.php before line 168 in function insert(). I don't think it will work on a 3-dimension array (array[array[array[]]]), but it works on my 2-dimension array.

// handle arrays that should be turned into JSON strings (only for Postgres)
                foreach($item as $key2=>$val):
                    if(is_array($val)):
                        $item[$key2] = json_encode($val);
                    endif;
                endforeach;
leomarquine commented 7 years ago

I think the correct approach would be casting the array to json in a transformation step.

Maybe I have time to make that transformer at night, but you can make your own custom transformation by implementing the Marquine\Etl\Transformers\ TransformerInterface interface and do:

$job->extract('json', 'path/to/file.json')
    ->transform(Your\Custom\Transformation::class, $options)
    ->load('table', 'table_name');
stevevance commented 7 years ago

I agree that's the correct approach, but I don't understand, currently, where to write that class or how Your\Custom\Transformation namespaces work.

leomarquine commented 7 years ago

@stevevance, sorry for the late reply.

I just tagged a new version with the JsonEncode Transformer. Now you can do:

$job->transform('JsonEncode', ['columns' => ['columnName']]);
stevevance commented 7 years ago

@leomarquine Thank you, I'll try it out soon.