trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.22k stars 2.95k forks source link

How to Use UNNEST For Arrays Of Arrays in BigQuery connector #3884

Closed Asturias-sam closed 1 year ago

Asturias-sam commented 4 years ago

I am running below query in Presto:

SELECT location.zone
FROM table 
CROSS JOIN UNNEST(items) as t(item,  quantity, location);

I am getting the following error:

Presto error: Unhandled type for Block:

array(row("item" row("departmentNumber" bigint,"itemNumber" bigint,"shelfCapacity" row("qty" double,"precision" bigint)),

 "quantity" row("qty" double,"precision" bigint),

 "location" row("zone" varchar,"aisle" bigint)))

As you can see for item is having an array shelfCapacity inside it, how to unnest this type of array where you have array inside array ?

martint commented 4 years ago

Can you paste the complete stacktrace and the schema of your table?

Asturias-sam commented 4 years ago

@martint On Presto Cli just getting the above error using the mentioned query. I used this command show columns from table; below is items description

items 
 array(row(item   row(departmentNumber bigint, itemNumber bigint, shelfCapacity row(qty double, precision   bigint, uom varchar)), quantity row(qty double, precision bigint), location row(zone varchar, aisle bigint)))

-- | --

Stack Trace:

io.prestosql.spi.PrestoException: Unhandled type for Block: array(row("item" row("departmentNumber" bigint,"itemNumber" bigint,"shelfCapacity" row("qty" double,"precision" bigint,"uom" varchar)),"quantity" row("qty" double,"precision" bigint),"location" row("zone" varchar,"aisle" bigint))
    at io.prestosql.plugin.bigquery.BigQueryResultPageSource.writeBlock(BigQueryResultPageSource.java:254)
    at io.prestosql.plugin.bigquery.BigQueryResultPageSource.appendTo(BigQueryResultPageSource.java:192)
    at io.prestosql.plugin.bigquery.BigQueryResultPageSource.getNextPage(BigQueryResultPageSource.java:141)
    at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:302)
    at io.prestosql.operator.Driver.processInternal(Driver.java:379)
    at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
    at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
    at io.prestosql.operator.Driver.processFor(Driver.java:276)
    at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
    at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
    at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
    at io.prestosql.$gen.Presto_333____20200520_073328_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Asturias-sam commented 4 years ago

@martint i have updated the stack trace but this query run fine in big query:

select 
item_detail.location.zone, 
FROM table 
CROSS JOIN UNNEST(items) as item_detail;
martint commented 4 years ago

Thanks. That appears to be a bug in the BigQuery connector.

@davidrabinowitz, is this something you can look into?

davidrabinowitz commented 4 years ago

@som2014 Can you please add the schema of the BigQuery table?

Asturias-sam commented 4 years ago

Below are the details:

Field name Type Mode Policy tags Description
items RECORD REPEATED    
items. item RECORD REQUIRED    
items.item. itemNumber INTEGER NULLABLE    
items.item. shelfCapacity RECORD NULLABLE    
items.item.shelfCapacity. qty FLOAT REQUIRED  
items.item.shelfCapacity. precision INTEGER REQUIRED  
items.item.shelfCapacity. uom STRING REQUIRED  
items. quantity RECORD NULLABLE    
items.quantity. qty FLOAT REQUIRED  
items.quantity. precision INTEGER REQUIRED  
items. location RECORD NULLABLE    
items.location. zone STRING NULLABLE    
items.location. aisle INTEGER NULLABLE    
Asturias-sam commented 4 years ago

@davidrabinowitz is this a bug any work around ?

ebyhr commented 4 years ago

@som2014 Can you share the simple steps to reproduce? (e.g. CREATE TABLE and INSERT statement) I couldn't reproduce the issue.

I sent an invitation for the community Slack so that we can talk quickly: https://prestosql.io/slack.html

davidrabinowitz commented 4 years ago

@som2014 I'm checking this out. Will update when I have more details.

davidrabinowitz commented 4 years ago

Hi @som2014 I'm afraid I cannot reproduce the issue. Based on the schema you have given I've created the following table:

Field name                         Type    Mode     Policy tags  Description
items                              RECORD  REPEATED 
items.item                         RECORD  REQUIRED 
items.item.itemNumber              INTEGER NULLABLE 
items.item.shelfCapacity           RECORD  NULLABLE 
items.item.shelfCapacity.qty       FLOAT   REQUIRED 
items.item.shelfCapacity.precision INTEGER REQUIRED 
items.item.shelfCapacity.uom       STRING  REQUIRED 
items.quantity                     RECORD  NULLABLE 
items.quantity.qty                 FLOAT   REQUIRED 
items.quantity.precision           INTEGER REQUIRED 
items.location                     RECORD  NULLABLE 
items.location.zone                STRING  NULLABLE 
items.location.aisle               INTEGER NULLABLE 

I've added couple of record to it and ran a simple select *:

                                                                                                                                   items                                                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [{item={itemNumber=3, shelfCapacity={qty=13.0, precision=3, uom=idk}}, quantity={qty=13.0, precision=3}, location={zone=c, aisle=3}}, {item={itemNumber=4, shelfCapacity={qty=14.0, precision=4, uom=rdk}}, quantity={qty=14.0, precision=4}, location={zone=d, aisle=4}}] 
 [{item={itemNumber=1, shelfCapacity={qty=10.0, precision=2, uom=idk}}, quantity={qty=10.0, precision=2}, location={zone=a, aisle=1}}, {item={itemNumber=2, shelfCapacity={qty=11.0, precision=3, uom=rdk}}, quantity={qty=11.0, precision=3}, location={zone=b, aisle=2}}] 
(2 rows)

I then ran your query,

SELECT location.zone
FROM table 
CROSS JOIN UNNEST(items) as t(item,  quantity, location);

and got this result:

 zone 
------
 a    
 b    
 c    
 d    
(4 rows)

BTW, please note that this schema does not contain an array of arrays, but an array of structs, containing other structs. Can you provide more details? Which Presto version are you using? Is it possible to generate sample data that reproduces the problem?

Asturias-sam commented 4 years ago

Current Presto Version : 333

Sample Data which i am getting the above error :

[[[91, "VOILA FS FAJITA CHKN", 554268888, 554268888, null, "00014500015136", 14185212, [8.0, 0, "EACH"]], [8.0, 0, "EACH"], ["A", 6, 5, null]]]

[[[95, "GSHR MEGA PAK 20CT", 573338692, 573338692, null, "00016000147096", 15688504, [10.0, 0, "EACH"]], [10.0, 0, "EACH"], ["A", 22, 12, null]]]
davidrabinowitz commented 4 years ago

Thanks for that. It seems that the schema is a bit different, for example location has 4 fields rather than 2 . Can you please send the full schema so I can test this?

Asturias-sam commented 4 years ago

@davidrabinowitz sorry for that i thought the problem was having nested structure inside the structure so i simplify the schema , will share it soon.

Asturias-sam commented 4 years ago

Below is schema :

items RECORD REPEATED    
items. item RECORD REQUIRED    
items.item. departmentNumber INTEGER NULLABLE    
items.item. description STRING NULLABLE    
items.item. itemNumber INTEGER NULLABLE    
items.item. primeItemNumber INTEGER NULLABLE    
items.item. gtin STRING NULLABLE    
items.item. primeGtin STRING NULLABLE    
items.item. replenishmentGroupNumber INTEGER NULLABLE    
items.item. shelfCapacity RECORD NULLABLE    
items.item.shelfCapacity. qty FLOAT REQUIRED  
items.item.shelfCapacity. precision INTEGER REQUIRED  
items.item.shelfCapacity. uom STRING REQUIRED  
items. quantity RECORD NULLABLE    
items.quantity. qty FLOAT REQUIRED  
items.quantity. precision INTEGER REQUIRED  
items.quantity. uom STRING REQUIRED  
items. location RECORD NULLABLE    
items.location. zone STRING NULLABLE    
items.location. aisle INTEGER NULLABLE    
items.location. section INTEGER NULLABLE    
items.location. sgln STRING NULLABLE  
davidrabinowitz commented 4 years ago

I've taken ran Prestosql 333 with a table using your data. Unfortunately I couldn'r reproduce this bug:

> select * from p3884a;
                                                                                                                                                                  items                                                                                                                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [{item={departmentNumber=91, description=VOILA FS FAJITA CHKN, itemNumber=554268888, primeItemNumber=554268888, gtin=NULL, primeGtin=00014500015136, replenishmentGroupNumber=14185212, shelfCapacity={qty=8.0, precision=0, uom=EACH}}, quantity={qty=8.0, precision=0, uom=EACH}, location={zone=A, aisle=6, section=5, sgln=NULL}}]   
 [{item={departmentNumber=95, description=GSHR MEGA PAK 20CT, itemNumber=573338692, primeItemNumber=573338692, gtin=NULL, primeGtin=00016000147096, replenishmentGroupNumber=15688504, shelfCapacity={qty=10.0, precision=0, uom=EACH}}, quantity={qty=10.0, precision=0, uom=EACH}, location={zone=A, aisle=22, section=12, sgln=NULL}}] 
(2 rows)

And the second query:

> SELECT location.zone FROM p3884a CROSS JOIN UNNEST(items) as t(item,  quantity, location);
 zone 
------
 A    
 A    
(2 rows)

@martint any idea on how to proceed?

martint commented 4 years ago

@davidrabinowitz, from the stacktrace, the flow would fall through to that case if the type is ArrayType but the value is not a List:

    if (type instanceof ArrayType && value instanceof List<?>) {

Under what conditions would that happen? Are we reading the wrong column?

davidrabinowitz commented 4 years ago

Perhaps - as mentioned I was not able to reproduce this issue locally. If you can debug this query and copy the state of the PageSource - which type is it, what is the value - type and contents. Any other variables may help as well.

ebyhr commented 1 year ago

Let me close this issue as we can't reproduce it. This might be temporary issue in BigQuery side. Please feel free to reopen or file a new issue if you still face the issue.