jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.69k stars 710 forks source link

Is it normal for the postgresql test to get stuck and doesn't finish? #468

Open winddd opened 4 years ago

winddd commented 4 years ago

I ran the first version of postgresql test, i.e. the snapshot of Add stolon/postgres test. After a series of read and append operations, it has already detected the problem of could not serialize in the terminal. After that, the process got stuck and didn't finish for a long time.

Here is my screenshot of getting stuck:

Is it normal? If not, I guess there are some problems that I haven't dealt with well.

aphyr commented 4 years ago

What does "a long time" mean here? Longer than the time limit you set for the test?

winddd commented 4 years ago

What does "a long time" mean here? Longer than the time limit you set for the test?

Yes, it is. The time-limit is 120. The program got stuck for over one hour.

aphyr commented 4 years ago

Huh. That shouldn't happen! It looks like Jepsen's in the middle of making some requests to the Postgres node, and presumably it's not answering. There should be timeouts here, but maybe they're not working correctly?

winddd commented 4 years ago

Huh. That shouldn't happen! It looks like Jepsen's in the middle of making some requests to the Postgres node, and presumably it's not answering. There should be timeouts here, but maybe they're not working correctly?

I read the keeper.log of stolon on the db node. It tells "too many clients already". Not sure this is the cause of getting stuck. I used the same command line arguments as the README.md in stolon test.

Also read sentinel.log, it tells the master db is failed, just like you told me. Is it possible for too many clients cause the master db node to fail?

aphyr commented 4 years ago

Oh! Yeah, that could be a thing. I haven't actually finished the Stolon test--I found bugs in single-node Postgres deployments and worked on those instead, so this test is very much unfinished. I mean, it ran for me once, but that clearly doesn't mean much haha. ;-)