kamatama41 / embulk-filter-hash

MIT License
12 stars 1 forks source link

Hash long type columns #15

Closed jhettler closed 6 years ago

jhettler commented 6 years ago

Hi, please, have you tried to hash long type columns? If you have column definition in JDBC input plugin like this

in:
   type: db2
   driver_path: ../airflow/extra/db2jcc4.jar
   host: db2
   port: 50001
   user: 
   password: 
   database: G;
   fetch_rows: 10000
   table: CREDIT_ACCOUNT
   default_timezone: CET
   select: CREDIT_ACCOUNT_ID as CREDIT_ACCOUNT_ID,NAME as NAME,COMMENT as COMMENT,COMPANY_G2ID as COMPANY_G2ID
   column_options:
      CREDIT_ACCOUNT_ID: {value_type: string}
      NAME: {value_type: string}
      COMMENT: {value_type: string}
      COMPANY_G2ID: {value_type: long}

I try to define the hash filter this way and hash column COMPANY_G2ID as long type

filters:
  - type: hash
    columns:
      - {name: NAME, algorithm: SHA-256}
      - {name: COMMENT, algorithm: SHA-256}
      - {name: COMPANY_G2ID, algorithm: SHA-256}

but I always get

Caused by: java.lang.IllegalStateException: Not reach here
    at org.embulk.spi.PageBuilder$AbstractColumnValue.setLong(PageBuilder.java:348)
    at org.embulk.spi.PageBuilder$Row.setLong(PageBuilder.java:291)
    at org.embulk.spi.PageBuilder$Row.access$300(PageBuilder.java:248)
    at org.embulk.spi.PageBuilder.setLong(PageBuilder.java:83)
    at org.embulk.spi.PageBuilder.setLong(PageBuilder.java:79)
    at org.embulk.filter.hash.HashFilterPlugin$open$1.setValue(HashFilterPlugin.kt:93)
    at org.embulk.filter.hash.HashFilterPlugin$open$1.add(HashFilterPlugin.kt:69)
    at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:353)
    at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:293)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.lang.IllegalStateException: Not reach here
        ... 13 more

Error: java.lang.IllegalStateException: Not reach here

Any ideas, how to solve this? Everything works fine, if I change the type of COMPANY_G2ID to string.

Thank you very much!

kamatama41 commented 6 years ago

@jhettler Thank you for reporting :) I will take a look at it.

P.S. Please let me know what Embulk version and Java version are you using.

kamatama41 commented 6 years ago

I was able to reproduce the error with the latest Embulk (0.9.7), will investigate more..

kamatama41 commented 6 years ago

Hi @jhettler, It was because of a regression with Embulk v0.9 and I fixed it. Please try to use embulk-filter-hash v0.4.0, thank you.

jhettler commented 6 years ago

HI @kamatama41, excellent, thank you for your time and fix!