trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.4k stars 2.99k forks source link

Flaky TestHiveAlluxioCacheFileOperations.testCacheFileOperations #22861

Closed sopel39 closed 1 month ago

sopel39 commented 3 months ago

https://github.com/trinodb/trino/actions/runs/10146256016/job/28054908427?pr=22827

 Error:  io.trino.plugin.hive.TestHiveAlluxioCacheFileOperations.testCacheFileOperations -- Time elapsed: 0.314 s <<< ERROR!
io.trino.testing.QueryFailedException: Could not read table schema
    at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:134)
    at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:565)
    at io.trino.testing.DistributedQueryRunner.executeWithPlan(DistributedQueryRunner.java:554)
    at io.trino.testing.QueryAssertions.assertDistributedUpdate(QueryAssertions.java:108)
    at io.trino.testing.QueryAssertions.assertUpdate(QueryAssertions.java:62)
mosabua commented 3 months ago

fyi @JiamingMai @jja725 .. can you maybe help.

jja725 commented 3 months ago

checking, @sopel39 is this error flaky in github only or it's flaky at local laptop as well? That would help debugging

sopel39 commented 3 months ago

Locally it passes for me

JiamingMai commented 3 months ago

I can run the test successfully in my local environment.

image
ebyhr commented 1 month ago

https://github.com/trinodb/trino/actions/runs/10910090412/job/30279798392

ebyhr commented 1 month ago

@jkylling Can you take a look at this issue? This test seems very flaky.

ebyhr commented 1 month ago

https://github.com/trinodb/trino/actions/runs/10919967042/job/30310600934

dekimir commented 1 month ago

The underlying cause seems the same as in #21121, which affects TestHiveConnectorTest.

pajaks commented 1 month ago

This is very likely caused by cache key collision. After adding some logging it’s visible that we try to read metadata file from cache with smaller size (it’s new file size) so JSON cannot be parsed. File read from cache:

{
 "writerVersion" : "testversion",
 "owner" : "hive",
 "tableType" : "MANAGED_TABLE",
 "dataColumns" : [ {
   "name" : "data",
   "type" : "string",
   "properties" : { }
 } ],
 "partitionColumns" : [ {
   "name" : "key",
   "type" : "string",
   "properties" : { }
 } ],
 "parameters" : {
   "trino_version" : "testversion",
   "trino_query_id" : "20240927_114936_00001_nvcjj",
   "transactional" : "false",
   "auto.purge" : "false",
   "numFiles" : "-1",
   "totalSize" : "-1"
 },
 "storageFormat" : "PARQUET",
 "serde

File read directly from storage:

{
  "writerVersion" : "testversion",
  "owner" : "hive",
  "tableType" : "MANAGED_TABLE",
  "dataColumns" : [ {
    "name" : "data",
    "type" : "string",
    "properties" : { }
  } ],
  "partitionColumns" : [ {
    "name" : "key",
    "type" : "string",
    "properties" : { }
  } ],
  "parameters" : {
    "trino_version" : "testversion",
    "trino_query_id" : "20240927_114644_00001_smc5a",
    "transactional" : "false",
    "auto.purge" : "false"
  },
  "storageFormat" : "PARQUET",
  "serdeParameters" : { },
  "columnStatistics" : { }
}

Notice lack of:

"numFiles" : "-1",
"totalSize" : "-1"

in updated file. Length of old file is: 592 and new 545. And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.

sopel39 commented 1 month ago

And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.

So file size is not part of cache key? cc @raunaqmorarka

pajaks commented 1 month ago

And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.

So file size is not part of cache key? cc @raunaqmorarka

It's not, it will be added in https://github.com/trinodb/trino/pull/23605