masayuki14 / worklog

Record working log by issues.
MIT License
0 stars 0 forks source link

use embulk-parser-jsonl plugin #9

Closed masayuki14 closed 6 years ago

masayuki14 commented 6 years ago

embulk-parser-jsonl を使うための試行錯誤

masayuki14 commented 6 years ago

sample json

{"foo": "bar", "bool": true, "bool_str": "true", "int": 42, "int_str": "42", "time": "2016-3-2 00:39:18", "double": 2.4, "double_str": "2.4", "array": [1, 2, 3]}
{"foo": null, "bool": false, "bool_str": "false", "int": 42, "int_str": "42", "time": "2016-3-2 00:39:18", "double": 2.4, "double_str": "2.4", "array": [{"inner": "hoge"}, {"inner": 1.5}]}

https://github.com/shun0102/embulk-parser-jsonl/blob/master/example/sample.json

ようはこれの array の配列から1要素目を取り出してカラムにあてたい。

masayuki14 commented 6 years ago

https://github.com/shun0102/embulk-parser-jsonl/blob/master/example/example.yml


in:
  type: file
  path_prefix: "example/sample"
  parser:
    type: jsonl
    # default_typecast: true # default: true
    columns:
      - {name: "foo", type: "string"}
      - {name: "bool", type: "boolean"}
      - {name: "bool_str", type: "boolean"}
      - {name: "int", type: "long"}
      - {name: "int_str", type: "long"}
      - {name: "time", type: "timestamp", format: '%Y-%m-%d %H:%M:%S'}
      - {name: "double", type: "double"}
      - {name: "double_str", type: "double"}
      - {name: "array", type: "json"}

例にあるように {name: "array", type: "json"} にする場合、テーブル側の型が json であればそのままデータが入る。

masayuki14 commented 6 years ago

実際のやつ

      - { name: images,           type: json } #, path: "images[1]" }
-- 抜粋

      Column      |            Type             | Modifiers
------------------+-----------------------------+-----------
 image            | text                        |
 images           | json                        |
-- select  結果

images           | ["https://media-cdn.tripadvisor.com/media/photo-s/02/da/78/10
/caption.jpg","https://media-cdn.tripadvisor.com/media/photo-w/0a/2c/9b/5c/capti
on.jpg"]

な感じで JSON がそのまま入る。

masayuki14 commented 6 years ago

でもやりたいのは images[0] のデータを image のテキストに入れたい。

masayuki14 commented 6 years ago

json plugin だと

- { name: "image", type: "string", path: "images[0]"}

で入れてくれる。

masayuki14 commented 6 years ago

とりあえず JSON のママデータ入れることにしよう。

masayuki14 commented 6 years ago

コレ悩んでもしょーがないやつや。