trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.36k stars 2.98k forks source link

Product test suite-2 and suite-3 are much slower on hdp3 #4176

Open findepi opened 4 years ago

findepi commented 4 years ago

suite-1

it's faster on hdp3

suite-2

suite-3

findepi commented 4 years ago

It might be that we simply run more tests on HDP3. e.g. TestHiveTransactionalTable. Does it take so much more time?

MiguelWeezardo commented 4 years ago

In suite-2 executed on non-HDP3 environment, TestHiveTransactionalTable.testReadFullAcid and TestHiveTransactionalTable.testReadInsertOnly are skipped. Due to these tests being in storage_formats test group as well as hive_transactional group, they are being executed in the second, third and fourth run-launcher invocation in suite-2. Each of these invocations runs each of the two tests in 8 different configurations of bucketing type and partitioning. This adds about 10 minutes to each run-launcher invocation.

In suite-3 the cause is the same - transactional tests are skipped in non-HDP3 envs. However, this suite only executes the storage_formats test group once, so the difference isn't as significant.

I suggest removing these tests from the storage_formats test group.

Alternatively we can add hive_transactional to the excluded groups in these invocations, but in this solution the same problem might still happen in other test suites.

MiguelWeezardo commented 4 years ago

Another, but smaller difference I noticed was increased HDP3 environment setup time - prestodev/hdp3.1-hive:28 docker image needed 4m53s to start vs. 2m20s for prestodev/hdp2.6-hive:28

This difference adds up with each run-launcher invocation.

findepi commented 4 years ago

thanks @MiguelWeezardo for the analysis.

Let's

@losipiuk @wendigo please verify this proposal

sopel39 commented 4 years ago

@findepi keep in mind that Caching tests transactions as part of storage_formats group. We need to continue testing caching with transactions.

losipiuk commented 4 years ago

@findepi keep in mind that Caching tests transactions as part of storage_formats group. We need to continue testing caching with transactions.

We can explicitly add HIVE_TRANSACTIONAL for caching related test run

losipiuk commented 4 years ago

thanks @MiguelWeezardo for the analysis.

Let's

  • remove all TestHiveTransactionalTable tests from storage_formats.
  • add a new test method in TestHiveTransactionalTable that will be storage_formats and will exercise a full ACID partitioned table with default bucketing (just one configuration) -- this is necessary to maintain test coverage for ACID tables with kerberized Metastore and/or HDFS, with impersonation off/on.
  • let's reduce or remove matrix usage from TestHiveTransactionalTable. While we definitely must be testing partitioned and non-partitioned and partitioned tables, as well as all bucketing type options, we do not ned to test all combinations thereof. It should be sufficient to have:

    • { full acid | insert only } x { partitioned | not partitioned } x { no bucketing | default bucketing }
    • { full acid } x { not partitioned } x { bucketing v1 | bucketing v2 }

@losipiuk @wendigo please verify this proposal

I would go with: { full acid } x { partitioned } x { bucketing v1 | bucketing v2 } - as this is a bit more nasty usecase.

The other way to approach it would be to drop storage_formats from runs on:

And instead tag only one of tests in TestHiveTransactionTable (as well as all relevant tests which were in storage_formats in different classes) with hdfs_impersonation and hdfs_no_impersonation. Ideally we would only use hdfs_impersonation/hdfs_no_impersonation for above runs. That way we would be sure that we do not get any tests duplication and that only relevant tests are run. Though it would require more care when assigning groups for new tests.

findepi commented 4 years ago

The other way to approach it would be to drop storage_formats from runs on:

  • singlenode-kerberos-hdfs-no-impersonation
  • singlenode-hdfs-impersonation
  • singlenode-kerberos-hdfs-impersonation

And instead tag only one of tests in TestHiveTransactionTable (as well as all relevant tests which were in storage_formats in different classes) with hdfs_impersonation and hdfs_no_impersonation. Ideally we would only use hdfs_impersonation/hdfs_no_impersonation for above runs. That way we would be sure that we do not get any tests duplication and that only relevant tests are run. Though it would require more care when assigning groups for new tests.

That's good idea, but I want this to be considered a follow-up. The reason why kerberized envs have storage_formats run is because storage formats exercise different code paths, and this is the way to identify all the missing doAs's.

@losipiuk please create an issue to track this.

I would go with: { full acid } x { partitioned } x { bucketing v1 | bucketing v2 } - as this is a bit more nasty usecase.

@losipiuk you mean this instead of all i proposed? or instead of part of it?

We can explicitly add HIVE_TRANSACTIONAL for caching related test run

Good idea. However, I think it should be enough to add hive_caching to the one test method I singled out that should be kept in storage_formats:

add a new test method in TestHiveTransactionalTable that will be storage_formats

findepi commented 4 years ago

I can take the action here.

findepi commented 4 years ago

I would go with: { full acid } x { partitioned } x { bucketing v1 | bucketing v2 } - as this is a bit more nasty usecase.

@losipiuk you mean this instead of all i proposed? or instead of part of it?

i see now. This was instead of { full acid } x { partitioned } x { bucketing v1 | bucketing v2 }

The reason why i think non-partitioned is enough, is that default bucketing is generally "v2" (hive 3+). So we actually test the partitioned v2 case.

I want to test remaining bucketing options in a non-partitioned case, becasue i expect this be faster, without decrease of coverage.

findepi commented 4 years ago

I've coded my proposal -- https://github.com/prestosql/presto/pull/4332

We could do more, however:

homar commented 1 year ago

Not sure if related but just got a timeout related failure here https://github.com/trinodb/trino/pull/16205/checks?check_run_id=12768456819