kagkarlsson / db-scheduler

Persistent cluster-friendly scheduler for Java
Apache License 2.0
1.25k stars 191 forks source link

Add documentation for building source and minimum requirements/spec #388

Closed Agorguy closed 1 year ago

Agorguy commented 1 year ago

Hello,

We tried running your project and discovered that it contains some flaky tests (i.e., tests that nondeterministically pass and fail). We found these tests to fail more frequently when running them on certain machines of ours.

To prevent others from running this project and its tests in machines that may result in flaky tests, we suggest adding information to the README.md file indicating the minimum resource configuration for running the tests of this project as to prevent observation of test flakiness.

If we run this project in a machine with 1cpu and 500 ram, we observe flaky tests. We found that the tests in this project did not have any flaky tests when we ran it on machines with 2cpu and 2gb ram.

Here is the test we have identified and their likelihood of failure on a system with less than the recommended 2 CPUs and 2 GB RAM.

  1. com.github.kagkarlsson.scheduler.WaiterTest#should_wait_until_woken
  2. com.github.kagkarlsson.scheduler.compatibility.HsqlCompatibilityTest#test_compatibility

Please let me know if you would like us to create a pull request on this matter (possibly to the readme of this project).

Thank you for your attention to this matter. We hope that our recommendations will be helpful in improving the quality and performance of your project, especially for others to use.

Reproducing

FROM ubuntu:20.04

ARG DEBIAN_FRONTEND=noninteractive

# install docker git and java
RUN apt-get update && apt-get install -y \
    docker docker.io openjdk-11-jdk wget git

# install maven
RUN wget https://downloads.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz && \
    tar -xvzf apache-maven-3.6.3-bin.tar.gz && \
    mv apache-maven-3.6.3 /opt/maven

ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
ENV M2_HOME=/opt/maven
ENV PATH=${M2_HOME}/bin:${PATH}

WORKDIR /home/

RUN git clone https://github.com/kagkarlsson/db-scheduler  && \
  cd db-scheduler && \
  git checkout cc67408a52a251c85454d0c5b52724f65b96921a

WORKDIR /home/db-scheduler

RUN mvn install -DskipTests 

ENTRYPOINT ["mvn", "test", "-fn"]

build the image:

$> mkdir tmp
$> cp Dockerfile tmp
$> cd tmp
$> docker build -t db-scheduler .  # estimated time of build 2m

Running:

# this configuration likely prevents flakiness (no flakiness in 10 runs). This -v is necessary because to run docker inside the docker
$> docker run -v /var/run/docker.sock:/var/run/docker.sock --rm --memory=2g --cpus=4 db-scheduler | tee output.txt
$> grep "Failures:"  output.txt 
## this configuration –similar to the previous– can’t prevent flaky tests (observation in 10 runs) 
$>docker run  -v /var/run/docker.sock:/var/run/docker.sock --rm --memory=500mb --cpus=1 db-scheduler | tee output2.txt
$> grep "Failures:"  output2.txt 
kagkarlsson commented 1 year ago

Yes there are some flaky tests due to the tricky nature of testing concurrent code. Additionally some issues with Instant-truncation and flaky testcontainers. I try to fix the painful ones. Typically I run the tests on a multicore machine, I have never tried a single core one. Not sure why I would 🤔

I would rather we fixed the test than add that type of disclaimer readme though.

Pull-requests making tests more stable would be awesome. However, not by just adding Thread.sleeps :)

Agorguy commented 1 year ago

@kagkarlsson I wanted to provide additional context about my efforts to fix the tests. I have invested considerable time and effort in trying to resolve the issues. However, due to the project's complexity, finding a simple solution has been challenging. Despite exploring different approaches, I have been unsuccessful in achieving the desired stability.

This is the error message: expected: <true> but was: <false> in the line 39. (link).

You mentioned running the tests on a multicore machine, but not on a single-core machine. While this may have worked for you, it's important to consider that users with different hardware configurations may encounter problems. By specifying the minimum system requirements in the project documentation, we can ensure that all users are aware of the baseline configuration necessary for stable test execution. This approach can prevent unexpected test failures and save developers the trouble of debugging unrelated issues.

kagkarlsson commented 1 year ago

I see your point. But how many would read that doc and how many of those are building on single-core machines? I would guess not many. But I agree that it might be useful to add a section in the Readme on how to build the source, and there possibly add this hint

Agorguy commented 1 year ago

Hello @kagkarlsson can you take a look at this PR?

I made it based on what we discussed above and your last comment. Let me know if you think something can be improved..

kagkarlsson commented 1 year ago

🎉 This issue has been resolved in v12.4.0 (Release Notes)