teragrep / pth_10

Data Processing Language (DPL) translator for Apache Spark
GNU Affero General Public License v3.0
0 stars 2 forks source link

If clause should support mismatch of datatypes #234

Closed eemhu closed 5 months ago

eemhu commented 5 months ago

Describe the bug

%dpl
index=xyz earliest="01/01/2017:02:34:19"
| rex4j field=_raw "customerId=\"(?<customerId>([a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}))\""
| eval ID= split(customerId,"-")
| eval customerID=if(ID="", "null", ID)

results in

org.apache.spark.sql.AnalysisException: cannot resolve 'CASE WHEN UDF:EvalOperation(ID, 796, ) THEN 'null' ELSE `ID` END' due to data type mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;;

Expected behavior

Should work and generate "null" for each empty split Workaround is to use

| eval customerID=if(ID=mvappend(""), mvappend("null"), ID)

as the types will match (mvappend produces an array type)

How to reproduce

Try running command

Screenshots

Software version

4.17.0

Desktop (please complete the following information if relevant):

Additional context

eemhu commented 5 months ago

internal pr merged