Closed sopel39 closed 1 month ago
fyi @JiamingMai @jja725 .. can you maybe help.
checking, @sopel39 is this error flaky in github only or it's flaky at local laptop as well? That would help debugging
Locally it passes for me
I can run the test successfully in my local environment.
@jkylling Can you take a look at this issue? This test seems very flaky.
The underlying cause seems the same as in #21121, which affects TestHiveConnectorTest
.
This is very likely caused by cache key collision. After adding some logging it’s visible that we try to read metadata file from cache with smaller size (it’s new file size) so JSON cannot be parsed. File read from cache:
{
"writerVersion" : "testversion",
"owner" : "hive",
"tableType" : "MANAGED_TABLE",
"dataColumns" : [ {
"name" : "data",
"type" : "string",
"properties" : { }
} ],
"partitionColumns" : [ {
"name" : "key",
"type" : "string",
"properties" : { }
} ],
"parameters" : {
"trino_version" : "testversion",
"trino_query_id" : "20240927_114936_00001_nvcjj",
"transactional" : "false",
"auto.purge" : "false",
"numFiles" : "-1",
"totalSize" : "-1"
},
"storageFormat" : "PARQUET",
"serde
File read directly from storage:
{
"writerVersion" : "testversion",
"owner" : "hive",
"tableType" : "MANAGED_TABLE",
"dataColumns" : [ {
"name" : "data",
"type" : "string",
"properties" : { }
} ],
"partitionColumns" : [ {
"name" : "key",
"type" : "string",
"properties" : { }
} ],
"parameters" : {
"trino_version" : "testversion",
"trino_query_id" : "20240927_114644_00001_smc5a",
"transactional" : "false",
"auto.purge" : "false"
},
"storageFormat" : "PARQUET",
"serdeParameters" : { },
"columnStatistics" : { }
}
Notice lack of:
"numFiles" : "-1",
"totalSize" : "-1"
in updated file. Length of old file is: 592 and new 545. And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.
And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.
So file size is not part of cache key? cc @raunaqmorarka
And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.
So file size is not part of cache key? cc @raunaqmorarka
It's not, it will be added in https://github.com/trinodb/trino/pull/23605
https://github.com/trinodb/trino/actions/runs/10146256016/job/28054908427?pr=22827