treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

Value of the array can not be referenced in a guess file #223

Closed yokatsuki closed 8 years ago

yokatsuki commented 8 years ago

Hello, I am trying to set the guess processing into multiple tables in digdag.

_export:
  tables: [table_a, table_b, table_c]

+repeat:
  loop>: ${tables.length}
  _do:
    +guess:
      sh>: embulk guess guess.yml.liquid -o config-import-${tables[i]}.yml

It has been able to include the value of the array to the name of the config file. However, it will not be able to read the value of the array in the guess file.

in:
  type: s3
  bucket: <BUCKET>
  path_prefix: {{env.tables[i]}}
  access_key_id: <ACCESS_KEY>
  secret_access_key: <SECRET_ACCESS_KEY>
out:
  type: redshift
  host: <HOST>
  user: <USER>
  password: <PASSWORD>
  database: <DATABASE>
  table: {{env.tables[i]}}
  access_key_id: <ACCESS_KEY>
  secret_access_key: <SECRET_ACCESS_KEY>
  s3_bucket: <BUCKET>
  s3_key_prefix: <PREFIX>
  mode: <MODE>
2016-08-18 11:59:34.219 +0000 [INFO] (0001:guess): Loaded plugin embulk-input-s3 (0.2.8)
org.embulk.config.ConfigException: com.fasterxml.jackson.databind.JsonMappingException: Setting null to a task field is not allowed. Use Optional<T> (com.google.common.base.Optional) to represent null.

How can the description in the guess file, or you will be able to see an array of in the dig file?

hiroyuki-sato commented 8 years ago

@yokatsuki

Please try the following dig file.

_export:
  tables: [table_a, table_b, table_c]

+repeat:
  loop>: ${tables.length}
  _do:
    +guess:
      sh>: env table_name=${tables[i]} embulk guess hoge/seed.yml.liquid -o ${tables[i]}.yml

hoge/seed.yml.liquid

in:
  type: file
  path_prefix: "/private/tmp/xxx/hoge/csv/sample_"
out:
  type: {{ env.table_name }} # this is just test

Digdag export environment variable as tables="[table_a, table_b, table_c]" . So you can't extract environment variable {{env.tables[i]}} part in Embulk config file.

You can check export variables like the following.

_export:
  tables: [table_a, table_b, table_c]

+repeat:
  loop>: ${tables.length}
  _do:
    +guess:
      sh>: env
digdag run test --rerun

An example output is the following.

project_id=1
...
tables=["table_a","table_b","table_c"]

So I define environment variable table_name=XXX before embulk guess

yokatsuki commented 8 years ago

Thank you for quick answer. Set in accordance with your instructions, three of the config file has been created. And it had been in each of the file contains the result of estimating the table definition. Issue will close since the fully resolved. Thanks again.