teragrep / pth_10

Data Processing Language (DPL) translator for Apache Spark
GNU Affero General Public License v3.0
0 stars 6 forks source link

bloom operations don't work with regexextract #379

Open elliVM opened 2 weeks ago

elliVM commented 2 weeks ago

Describe the bug Running bloom create on tokens saved by regexextract command can't be read correctly. expected bytes instead of String java.lang.String cannot be cast to [B

Expected behavior bloom create and update should be able to run after regexextract is used to save tokens

How to reproduce Run bloom create pipeline and replace tokenizer with regexextract

Software version tg7 pth_10 version: 8.0.1

elliVM commented 2 weeks ago

dpf-03 BloomFilterAggregator expects Array of byte[]. Convert list of String to list of bytes before aggregator step for fix