postgrespro / pg_wait_sampling

Sampling based statistics of wait events
Other
147 stars 35 forks source link

Integrate isolation tests for more granular testing #50

Open maksm90 opened 2 years ago

maksm90 commented 2 years ago

Many buggy cases require for their reproducing non-trivial scenarios with multiple sessions. The isolation test utility is a good choice to emulate these scenarios. Additionally isolation tests, in general, allow to perform more granular verification required to extension.

Current patch adds two isolation test cases: one is for base case with queryId, another as bug fix verification for not working case of calculation queryId for relation lock waiting. Also Makefile and README were changed to account the possibility to run isolation tests.

maksm90 commented 2 years ago

Hmm, unfortunately it seems for PG 12 and 13 in docker container Travis CI job doesn't find pg_isolation_regress tool

maksm90 commented 2 years ago

Hmm, unfortunately it seems for PG 12 and 13 in docker container Travis CI job doesn't find pg_isolation_regress tool

This utility appeared in 14th version of postgres-alpine docker image. It seems for previous versions we have to build pg_isolation_regress manually(

maksm90 commented 2 years ago

This utility appeared in 14th version of postgres-alpine docker image.

Yes, distribution of pg_isolation_regress and dependent isolationtester utilities have started from PG14, in particular, in commit https://github.com/postgres/postgres/commit/2203ede9ae85b6423f850466122606275ea09b17 . But pgxs.mk have made available to run isolation tests starting from PG12 - in commit https://github.com/postgres/postgres/commit/d3c09b9b1307e022883801000ae36bcb5eef71e8 .

shinderuk commented 2 years ago

My main concern is that these tests depend on timing of events. If the machine is temporarily lagging, they could fail. And this is totally possible in our buildfarm. In fact, we may not get even a single sample during that one second, if the collector isn't scheduled. Unlikely, but still possible. Can we come up with more stable tests?

maksm90 commented 2 years ago

My main concern is that these tests depend on timing of events. If the machine is temporarily lagging, they could fail. And this is totally possible in our buildfarm. In fact, we may not get even a single sample during that one second, if the collector isn't scheduled. Unlikely, but still possible. Can we come up with more stable tests?

Hmm, all work of collector is tied up in timing. Another option might be using some framework that allows controllable running of processes (like gp_inject_fault in Greenplum) to stop and resume processes using sql commands. But it will require to integrate some infrastructure to enable proper functionality. We might to think about in future, moreover there was some proposal of this feature to postgres core. But now I propose to leave current scheme of testing, maybe not estimating sample counts as within some confidence range but that counter was positive, i.e., collector at least would take one sample for one second waiting.

shinderuk commented 2 years ago

Another option might be using some framework that allows controllable running of processes...

Thanks for the links, good to know.

...collector at least would take one sample for one second waiting.

I'm afraid we woud get intermittent failures in the buildfarm.

How about polling pg_wait_sampling_profile until we get a reasonable count? At least we can sleep as long as necessary. Something like this:

setup
{
    create table s2_pid (pid int);
    create table a1 ();
}

teardown
{
    drop table s2_pid;
    drop table a1;
}

session s1

step s1begin    { begin; }
step s1lock     { lock table a1; }
step s1poll
{
do $$
    declare
        i int = 0;
        cnt int;
    begin
        loop
            select count into cnt
                from pg_wait_sampling_profile
                where pid = (select pid from s2_pid)
                and event_type = 'Lock' and event = 'relation';
            exit when cnt > 50;
            if i < 10 then
                perform pg_sleep(1);
            else
                raise 'timed out';
            end if;
            i = i + 1;
        end loop;
    end;
$$;
}
step s1commit   { commit; }

session s2
step s2snitch   { insert into s2_pid select pg_backend_pid(); }
step s2begin    { begin; }
step s2lock     { lock table a1; }
step s2commit   { commit; }

permutation s2snitch s1begin s1lock s2begin s2lock s1poll s1commit s2commit
maksm90 commented 2 years ago

How about polling pg_wait_sampling_profile until we get a reasonable count? At least we can sleep as long as necessary.

I've rewritten isolation tests according your proposed manner. Please, review their.

maksm90 commented 2 years ago

All tests have passed. @shinderuk please review this PR

shinderuk commented 2 years ago

Regarding isolation tests with PostgreSQL < 14. Previously we discussed an option of building PostgreSQL from source and manually installing pg_isolation_regress and isolationtester in a CI build. After thinking some more, I'm pretty sure this would break the Debian build, which is done against postgresql-MAJOR and postgresql-server-dev-MAJOR Debian packages, and would similarly break builds for other Linux distributions. Likewise, anyone building pg_wait_sampling out of tree wouldn't be able to build against binary packages, and if building against PostgreSQL source, would have to manually install pg_isolation_regress anyway. Overall, this sounds like a bad idea.

maksm90 commented 2 years ago

Previously we discussed an option of building PostgreSQL from source and manually installing pg_isolation_regress and isolationtester in a CI build.

not PostgreSQL but just tools for isolation tests.

After thinking some more, I'm pretty sure this would break the Debian build, which is done against postgresql-MAJOR and postgresql-server-dev-MAJOR Debian packages, and would similarly break builds for other Linux distributions. Likewise, anyone building pg_wait_sampling out of tree wouldn't be able to build against binary packages, and if building against PostgreSQL source, would have to manually install pg_isolation_regress anyway.

You're confused that make installcheck command certainly requires tools for isolation tests (that are not presented in distributions of PG13 and below), aren't you?

shinderuk commented 2 years ago

We use postgres:13-alpine image from docker hub for building and testing in Travis. In this image, there are no pg_isolation_regress and isolationtester binaries. To run isolation tests we need to build them somehow. So I assume that we have to build PostgreSQL from source. Maybe we could build just these two binaries from source, but that doesn't make much difference.

On Debian I can install postgresql-13 and postgresql-server-dev-13 packages (from the Debian or PGDG repos). This should be enough to build and test extensions. In fact, our Debian build configuration (in the debian directory) does just that and successfully runs make installcheck.

If we required pg_isolation_regress and isolation tester to run our tests with PostgreSQL 13, then we would break the Debian build (which runs installcheck) and manual building and testing against the standard development package. And if one built PostgreSQL 13 from source, and then tried to build pg_wait_sampling and run tests out of tree (USE_PGXS=1), then make installcheck would fail for the lack of pg_isolation_regress and isolationtester.

All in all, if we use isolation tests, they will only run in Travis, because of our special arrangement, or only against PostgreSQL 14+, where pg_isolation_regress and isolationtester are available for out of tree extensions. That breaks the usual workflow for building and testing of extensions.

maksm90 commented 2 years ago

Maybe we could build just these two binaries from source, but that doesn't make much difference.

It takes much less time.

On Debian I can install postgresql-13 and postgresql-server-dev-13 packages (from the Debian or PGDG repos). This should be enough to build and test extensions. In fact, our Debian build configuration (in the debian directory) does just that and successfully runs make installcheck.

Got you: your main concern regards to broken make installcheck command for PG12-13 when we use regular distribution of postgres utilities. Well, another option might be to try to enable isolation test just for PG14+ via conditional operator in Makefile. How about this?

maksm90 commented 2 years ago

@shinderuk I've enabled isolation tests just for pg14+ when extension is built separately from postgres contrib directory. PR is ready for review