Local experiment issue : ERRO[0612] error waiting for container: unexpected EOF - [SOLVED]

Microsvuln commented 3 years ago

Hi.

I have a problem running local experiments, I get the following error while building benchmarks after running this :

(.venv) arash@fuzzbench-scale1:~/new/fuzzbench$ PYTHONPATH=. python3 experiment/run_experiment.py --experiment-config experiment-config.yaml --benchmarks bloaty_fuzz_target harfbuzz-1.3.2 libjpeg-turbo-07-2017 libpcap_fuzz_both libpng-1.2.56 libxml2-v2.9.2 --experiment-name $EXPERIMENT_NAME --fuzzers afl aflql

And the error log (summarized) :

INFO:root:Building using (<function build_measurer at 0x7f41267d5430>): [('bloaty_fuzz_target',), ('harfbuzz-1.3.2',), ('libjpeg-turbo-07-2017',), ('libpcap_fuzz_both',), ('libpng-1.2.56',), ('libxml2-v2.9.2',)]
INFO:root:Building measurer for benchmark: bloaty_fuzz_target.
INFO:root:Building measurer for benchmark: harfbuzz-1.3.2.
INFO:root:Building measurer for benchmark: libjpeg-turbo-07-2017.
INFO:root:Building measurer for benchmark: libpcap_fuzz_both.
INFO:root:Building measurer for benchmark: libpng-1.2.56.
INFO:root:Building measurer for benchmark: libxml2-v2.9.2.
INFO:root:Done building measurer for benchmark: libpcap_fuzz_both.
INFO:root:Done building measurer for benchmark: bloaty_fuzz_target.
INFO:root:Done building measurer for benchmark: libpng-1.2.56.
INFO:root:Done building measurer for benchmark: libjpeg-turbo-07-2017.
INFO:root:Done building measurer for benchmark: harfbuzz-1.3.2.
INFO:root:Done building measurer for benchmark: libxml2-v2.9.2.
INFO:root:Build successes: [('bloaty_fuzz_target',), ('harfbuzz-1.3.2',), ('libjpeg-turbo-07-2017',), ('libpcap_fuzz_both',), ('libpng-1.2.56',), ('libxml2-v2.9.2',)]
INFO:root:Done building measurers.
INFO:root:Building all fuzzer benchmarks.
INFO:root:Building using (<function build_fuzzer_benchmark at 0x7f41267d5670>): [('afl', 'bloaty_fuzz_target'), ('afl', 'harfbuzz-1.3.2'), ('afl', 'libjpeg-turbo-07-2017'), ('afl', 'libpcap_fuzz_both'), ('afl', 'libpng-1.2.56'), ('afl', 'libxml2-v2.9.2'), ('aflql', 'bloaty_fuzz_target'), ('aflql', 'harfbuzz-1.3.2'), ('aflql', 'libjpeg-turbo-07-2017'), ('aflql', 'libpcap_fuzz_both'), ('aflql', 'libpng-1.2.56'), ('aflql', 'libxml2-v2.9.2')]
INFO:root:Building benchmark: bloaty_fuzz_target, fuzzer: afl.
INFO:root:Building benchmark: harfbuzz-1.3.2, fuzzer: afl.
INFO:root:Building benchmark: libjpeg-turbo-07-2017, fuzzer: afl.
INFO:root:Building benchmark: libpcap_fuzz_both, fuzzer: afl.
INFO:root:Building benchmark: libpng-1.2.56, fuzzer: afl.
INFO:root:Building benchmark: libxml2-v2.9.2, fuzzer: afl.
INFO:root:Building benchmark: bloaty_fuzz_target, fuzzer: aflql.
INFO:root:Building benchmark: harfbuzz-1.3.2, fuzzer: aflql.
INFO:root:Building benchmark: libjpeg-turbo-07-2017, fuzzer: aflql.
INFO:root:Building benchmark: libpcap_fuzz_both, fuzzer: aflql.
INFO:root:Building benchmark: libpng-1.2.56, fuzzer: aflql.
INFO:root:Building benchmark: libxml2-v2.9.2, fuzzer: aflql.
ERRO[0585] error waiting for container: unexpected EOF  
ERROR:root:Executed command: "docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock -v /home/arash/test/experiment-data-zhest2:/home/arash/test/experiment-data-zhest2 -v /home/arash/test/report-data-zhest2:/home/arash/test/report-data-zhest2 -e INSTANCE_NAME=d-zhest2 -e EXPERIMENT=zhest2 -e SQL_DATABASE_URL=sqlite:////home/arash/test/experiment-data-zhest2/local.db?check_same_thread=False -e EXPERIMENT_FILESTORE=/home/arash/test/experiment-data-zhest2 -e REPORT_FILESTORE=/home/arash/test/report-data-zhest2 -e DOCKER_REGISTRY=gcr.io/fuzzbench -e LOCAL_EXPERIMENT=True --cap-add=SYS_PTRACE --cap-add=SYS_NICE --name=dispatcher-container gcr.io/fuzzbench/dispatcher-image /bin/bash -c rsync -r "${EXPERIMENT_FILESTORE}/${EXPERIMENT}/input/" ${WORK} && mkdir ${WORK}/src && tar -xvzf ${WORK}/src.tar.gz -C ${WORK}/src && PYTHONPATH=${WORK}/src python3 ${WORK}/src/experiment/dispatcher.py || /bin/bash" returned: 125.
Traceback (most recent call last):
  File "experiment/run_experiment.py", line 524, in <module>
    sys.exit(main())
  File "experiment/run_experiment.py", line 512, in main
    start_experiment(args.experiment_name,
  File "experiment/run_experiment.py", line 237, in start_experiment
    start_dispatcher(config, CONFIG_DIR)
  File "experiment/run_experiment.py", line 246, in start_dispatcher
    dispatcher.start()
  File "experiment/run_experiment.py", line 396, in start
    return new_process.execute(command, write_to_stdout=True)
  File "/home/arash/new/fuzzbench/common/new_process.py", line 124, in execute
    raise subprocess.CalledProcessError(retcode, command)
subprocess.CalledProcessError: Command '['docker', 'run', '-ti', '--rm', '-v', '/var/run/docker.sock:/var/run/docker.sock', '-v', '/home/arash/test/experiment-data-zhest2:/home/arash/test/experiment-data-zhest2', '-v', '/home/arash/test/report-data-zhest2:/home/arash/test/report-data-zhest2', '-e', 'INSTANCE_NAME=d-zhest2', '-e', 'EXPERIMENT=zhest2', '-e', 'SQL_DATABASE_URL=sqlite:////home/arash/test/experiment-data-zhest2/local.db?check_same_thread=False', '-e', 'EXPERIMENT_FILESTORE=/home/arash/test/experiment-data-zhest2', '-e', 'REPORT_FILESTORE=/home/arash/test/report-data-zhest2', '-e', 'DOCKER_REGISTRY=gcr.io/fuzzbench', '-e', 'LOCAL_EXPERIMENT=True', '--cap-add=SYS_PTRACE', '--cap-add=SYS_NICE', '--name=dispatcher-container', 'gcr.io/fuzzbench/dispatcher-image', '/bin/bash', '-c', 'rsync -r "${EXPERIMENT_FILESTORE}/${EXPERIMENT}/input/" ${WORK} && mkdir ${WORK}/src && tar -xvzf ${WORK}/src.tar.gz -C ${WORK}/src && PYTHONPATH=${WORK}/src python3 ${WORK}/src/experiment/dispatcher.py || /bin/bash']' returned non-zero exit status 125.

I don't know why this is happening but the number of fuzzers is 2 and the number of benchmarks is 6, even though I want to run many more of fuzzers and benchmarks.

Any solution to this?

Thanks.

jonathanmetzman commented 3 years ago

What are the specs of your machine? How many cores and how much RAM? I think this issue is https://github.com/moby/moby/issues/36324 If this is the case, adding in some resource control to local experiments should help.

Microsvuln commented 3 years ago

@jonathanmetzman

96 Cores of CPU and 64GB of RAM.

Had no problem with this machine before, I don't know if fuzzbench updates affect this.

Will try this and let you know about it.

jonathanmetzman commented 3 years ago

Hmmm...Maybe not then. One possible issue with builds on a single machine could be that each docker build might use as many cores as possible (e.g. make -j $NPROC). Maybe we need to limit this?

Microsvuln commented 3 years ago

@jonathanmetzman

Nope, I can confirm that I found how to reproduce this and how to solve this problem.

Description A fuzzbench local experiment would not be run successfully and will raise the ERROR[0585] error waiting for container: unexpected EOF error if the following condition is met :

If you add your private fuzzer to the fuzzbench fuzzer directory BEFORE you install the fuzzbench using the following commands:

$ make install-dependencies
$ source .venv/bin/activate
$ make presubmit

Here we assume that the Fuzzbench user has a customized/private fuzzer and aims to add his fuzzer to fuzzbench and run a local experiment.

Solution

Clone the fuzzbench github repository
Make sure that you didn't add anything to fuzzbench before installing it.
Try to install the fuzzbench as mentioned in the documentation.
After you installed the fuzzbench successfully, then try to add your fuzzer directory to fuzzer directory of fuzzbench and check it using the following commands :

$ make format
$ make presubmit

Now the experiment will be run successfully.

P.S 1 . Please make sure that your fuzzer has been added to fuzzbench AFTER the fuzzbench installation process.

P.S 2 . I managed to solve most of the local experiment issues I faced, if you like I can open a troubleshooting section in the fuzzbench documentation and add them along with solutions so a fuzzbench user has the chance to find the solution more quickly.

Problem solved, issue closed.

Thanks.

inferno-chromium commented 3 years ago

I need to understand what happened here, so reopening.

chenju2k6 commented 3 years ago

I can see the same issue on my side with this commit (ed0dba78f150373955e835866ed8c84d7669dae1) when use the run script to start the experiment. But the command "make-debug--" runs fine.

ERRO[0591] error waiting for container: unexpected EOF
ERROR:root:Executed command: "docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock -v /home/cju/experiment-data:/home/cju/experiment-data -v /home/cju/report-data:/home/cju/report-data -e INSTANCE_NAME=d-kirenenko -e EXPERIMENT=kirenenko -e SQL_DATABASE_URL=
sqlite:////home/cju/experiment-data/local.db?check_same_thread=False -e EXPERIMENT_FILESTORE=/home/cju/experiment-data -e REPORT_FILESTORE=/home/cju/report-data -e DOCKER_REGISTRY=gcr.io/fuzzbench -e LOCAL_EXPERIMENT=True --cap-add=SYS_PTRACE --cap-add=SYS_NICE --name=d
ispatcher-container gcr.io/fuzzbench/dispatcher-image /bin/bash -c rsync -r "${EXPERIMENT_FILESTORE}/${EXPERIMENT}/input/" ${WORK} && mkdir ${WORK}/src && tar -xvzf ${WORK}/src.tar.gz -C ${WORK}/src && PYTHONPATH=${WORK}/src python3 ${WORK}/src/experiment/dispatcher.py
|| /bin/bash" returned: 125.
Traceback (most recent call last):
  File "experiment/run_experiment.py", line 524, in <module>
    sys.exit(main())
  File "experiment/run_experiment.py", line 512, in main
    start_experiment(args.experiment_name,
  File "experiment/run_experiment.py", line 237, in start_experiment
    start_dispatcher(config, CONFIG_DIR)
  File "experiment/run_experiment.py", line 246, in start_dispatcher
    dispatcher.start()
  File "experiment/run_experiment.py", line 396, in start
    return new_process.execute(command, write_to_stdout=True)
  File "/home/cju/tmp/fuzzbench/common/new_process.py", line 124, in execute
    raise subprocess.CalledProcessError(retcode, command)
subprocess.CalledProcessError: Command '['docker', 'run', '-ti', '--rm', '-v', '/var/run/docker.sock:/var/run/docker.sock', '-v', '/home/cju/experiment-data:/home/cju/experiment-data', '-v', '/home/cju/report-data:/home/cju/report-data', '-e', 'INSTANCE_NAME=d-kirenenko
', '-e', 'EXPERIMENT=kirenenko', '-e', 'SQL_DATABASE_URL=sqlite:////home/cju/experiment-data/local.db?check_same_thread=False', '-e', 'EXPERIMENT_FILESTORE=/home/cju/experiment-data', '-e', 'REPORT_FILESTORE=/home/cju/report-data', '-e', 'DOCKER_REGISTRY=gcr.io/fuzzbenc
h', '-e', 'LOCAL_EXPERIMENT=True', '--cap-add=SYS_PTRACE', '--cap-add=SYS_NICE', '--name=dispatcher-container', 'gcr.io/fuzzbench/dispatcher-image', '/bin/bash', '-c', 'rsync -r "${EXPERIMENT_FILESTORE}/${EXPERIMENT}/input/" ${WORK} && mkdir ${WORK}/src && tar -xvzf ${W
ORK}/src.tar.gz -C ${WORK}/src && PYTHONPATH=${WORK}/src python3 ${WORK}/src/experiment/dispatcher.py || /bin/bash']' returned non-zero exit status 125.

inferno-chromium commented 3 years ago

I can see the same issue on my side with this commit (ed0dba7) when use the run script to start the experiment. But the command "make-debug--" runs fine.

ERRO[0591] error waiting for container: unexpected EOF
ERROR:root:Executed command: "docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock -v /home/cju/experiment-data:/home/cju/experiment-data -v /home/cju/report-data:/home/cju/report-data -e INSTANCE_NAME=d-kirenenko -e EXPERIMENT=kirenenko -e SQL_DATABASE_URL=
sqlite:////home/cju/experiment-data/local.db?check_same_thread=False -e EXPERIMENT_FILESTORE=/home/cju/experiment-data -e REPORT_FILESTORE=/home/cju/report-data -e DOCKER_REGISTRY=gcr.io/fuzzbench -e LOCAL_EXPERIMENT=True --cap-add=SYS_PTRACE --cap-add=SYS_NICE --name=d
ispatcher-container gcr.io/fuzzbench/dispatcher-image /bin/bash -c rsync -r "${EXPERIMENT_FILESTORE}/${EXPERIMENT}/input/" ${WORK} && mkdir ${WORK}/src && tar -xvzf ${WORK}/src.tar.gz -C ${WORK}/src && PYTHONPATH=${WORK}/src python3 ${WORK}/src/experiment/dispatcher.py
|| /bin/bash" returned: 125.
Traceback (most recent call last):
  File "experiment/run_experiment.py", line 524, in <module>
    sys.exit(main())
  File "experiment/run_experiment.py", line 512, in main
    start_experiment(args.experiment_name,
  File "experiment/run_experiment.py", line 237, in start_experiment
    start_dispatcher(config, CONFIG_DIR)
  File "experiment/run_experiment.py", line 246, in start_dispatcher
    dispatcher.start()
  File "experiment/run_experiment.py", line 396, in start
    return new_process.execute(command, write_to_stdout=True)
  File "/home/cju/tmp/fuzzbench/common/new_process.py", line 124, in execute
    raise subprocess.CalledProcessError(retcode, command)
subprocess.CalledProcessError: Command '['docker', 'run', '-ti', '--rm', '-v', '/var/run/docker.sock:/var/run/docker.sock', '-v', '/home/cju/experiment-data:/home/cju/experiment-data', '-v', '/home/cju/report-data:/home/cju/report-data', '-e', 'INSTANCE_NAME=d-kirenenko
', '-e', 'EXPERIMENT=kirenenko', '-e', 'SQL_DATABASE_URL=sqlite:////home/cju/experiment-data/local.db?check_same_thread=False', '-e', 'EXPERIMENT_FILESTORE=/home/cju/experiment-data', '-e', 'REPORT_FILESTORE=/home/cju/report-data', '-e', 'DOCKER_REGISTRY=gcr.io/fuzzbenc
h', '-e', 'LOCAL_EXPERIMENT=True', '--cap-add=SYS_PTRACE', '--cap-add=SYS_NICE', '--name=dispatcher-container', 'gcr.io/fuzzbench/dispatcher-image', '/bin/bash', '-c', 'rsync -r "${EXPERIMENT_FILESTORE}/${EXPERIMENT}/input/" ${WORK} && mkdir ${WORK}/src && tar -xvzf ${W
ORK}/src.tar.gz -C ${WORK}/src && PYTHONPATH=${WORK}/src python3 ${WORK}/src/experiment/dispatcher.py || /bin/bash']' returned non-zero exit status 125.

@chenju2k6 - can you please provide steps to reproduce.

chenju2k6 commented 3 years ago

Sure. This error happens if I add customized new fuzzers: exp_fuzzer1, exp_fuzzer2 . However, the unit testing "make-exp_fuzzer1-xxx" runs fine. I am wondering if there are any other sanity checks we can run before we schedule an experiment?

cd fuzzbench
make install-dependencies
source .venv/bin/activate
./run.sh

run.sh

EXPERIMENT_NAME=exp_long
PYTHONPATH=. python3 experiment/run_experiment.py \
--experiment-config run.yaml \
--benchmarks libxml2-v2.9.2 libpng-1.2.56 libjpeg-turbo-07-2017 sqlite3_ossfuzz \
--experiment-name $EXPERIMENT_NAME \
--fuzzers honggfuzz weizz_qemu aflplusplus_optimal exp_fuzzer1 exp_fuzzer2

run.yaml

# The number of trials of a fuzzer-benchmark pair.
trials: 5

# The amount of time in seconds that each trial is run for.
# 1 day = 24 * 60 * 60 = 86400
max_total_time: 43200

# The location of the docker registry.
# FIXME: Support custom docker registry.
# See https://github.com/google/fuzzbench/issues/777
docker_registry: gcr.io/fuzzbench

# The local experiment folder that will store most of the experiment data.
# Please use an absolute path.
experiment_filestore: /home/cju/experiment-data

# The local report folder where HTML reports and summary data will be stored.
# Please use an absolute path.
report_filestore: /home/cju/report-data

# Flag that indicates this is a local experiment.
local_experiment: true

EliaGeretto commented 3 years ago

Any news on this? I am experiencing the problem and I am willing to help debugging. I have tried the procedure suggested above, but it did not help. What does help is reducing the number of benchmarks being targeted.

anfedotoff commented 3 years ago

I have the same problem.

ERRO[0413] error waiting for container: unexpected EOF  
ERROR:root:Executed command: "docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/experiment-data:/tmp/experiment-data -v /tmp/report-data:/tmp/report-data -e INSTANCE_NAME=d-test-local-fuzzbench -e EXPERIMENT=test-local-fuzzbench -e SQL_DATABASE_URL=sqlite:////tmp/experiment-data/local.db?check_same_thread=False -e EXPERIMENT_FILESTORE=/tmp/experiment-data -e REPORT_FILESTORE=/tmp/report-data -e DOCKER_REGISTRY=gcr.io/fuzzbench -e LOCAL_EXPERIMENT=True --cap-add=SYS_PTRACE --cap-add=SYS_NICE --name=dispatcher-container gcr.io/fuzzbench/dispatcher-image /bin/bash -c rsync -r "${EXPERIMENT_FILESTORE}/${EXPERIMENT}/input/" ${WORK} && mkdir ${WORK}/src && tar -xvzf ${WORK}/src.tar.gz -C ${WORK}/src && PYTHONPATH=${WORK}/src python3 ${WORK}/src/experiment/dispatcher.py || /bin/bash" returned: 125.
Traceback (most recent call last):
  File "experiment/run_experiment.py", line 553, in <module>
    sys.exit(main())
  File "experiment/run_experiment.py", line 540, in main
    start_experiment(args.experiment_name,
  File "experiment/run_experiment.py", line 242, in start_experiment
    return start_experiment_from_full_config(config)
  File "experiment/run_experiment.py", line 257, in start_experiment_from_full_config
    start_dispatcher(config, experiment_utils.CONFIG_DIR)
  File "experiment/run_experiment.py", line 266, in start_dispatcher
    dispatcher.start()
  File "experiment/run_experiment.py", line 414, in start
    return new_process.execute(command, write_to_stdout=True)
  File "/home/fedotoff/fuzzbench/common/new_process.py", line 124, in execute
    raise subprocess.CalledProcessError(retcode, command)
subprocess.CalledProcessError: Command '['docker', 'run', '-ti', '--rm', '-v', '/var/run/docker.sock:/var/run/docker.sock', '-v', '/tmp/experiment-data:/tmp/experiment-data', '-v', '/tmp/report-data:/tmp/report-data', '-e', 'INSTANCE_NAME=d-test-local-fuzzbench', '-e', 'EXPERIMENT=test-local-fuzzbench', '-e', 'SQL_DATABASE_URL=sqlite:////tmp/experiment-data/local.db?check_same_thread=False', '-e', 'EXPERIMENT_FILESTORE=/tmp/experiment-data', '-e', 'REPORT_FILESTORE=/tmp/report-data', '-e', 'DOCKER_REGISTRY=gcr.io/fuzzbench', '-e', 'LOCAL_EXPERIMENT=True', '--cap-add=SYS_PTRACE', '--cap-add=SYS_NICE', '--name=dispatcher-container', 'gcr.io/fuzzbench/dispatcher-image', '/bin/bash', '-c', 'rsync -r "${EXPERIMENT_FILESTORE}/${EXPERIMENT}/input/" ${WORK} && mkdir ${WORK}/src && tar -xvzf ${WORK}/src.tar.gz -C ${WORK}/src && PYTHONPATH=${WORK}/src python3 ${WORK}/src/experiment/dispatcher.py || /bin/bash']' returned non-zero exit status 125.

It depends on how many benchmarks and fuzzers I choose. I don't use custom fuzzers only supported once. For example this run string fails:

PYTHONPATH=. python3 experiment/run_experiment.py \
--experiment-config ../experiment-config.yaml \
--benchmarks lcms-2017-03-21 freetype2-2017 libpng-1.2.56 \
--experiment-name test-local-fuzzbench \
--fuzzers aflplusplus libfuzzer fuzzolic_aflplusplus_z3

But this string works fine (removed one benchmark):

PYTHONPATH=. python3 experiment/run_experiment.py \
--experiment-config ../experiment-config.yaml \
--benchmarks lcms-2017-03-21 freetype2-2017  \
--experiment-name test-local-fuzzbench \
--fuzzers aflplusplus libfuzzer fuzzolic_aflplusplus_z3

My machine has 128cpu 500gb memory.

wtdcode commented 1 year ago

Same here, any update?

jonathanmetzman commented 1 year ago

No update from me at least. Unfortunately we don't use this feature much because we run all of our experiments in the cloud, so it's unlikely to get a ton of attention.

mvanotti commented 1 year ago

Is it possible that this is related with running multiple containers at the same time ?

I was hitting this issue consistently (with 30 concurrent builders and almost all benchmarks), and then added a random sleep before launching each docker container and everything started to work.

google / fuzzbench

Local experiment issue : ERRO[0612] error waiting for container: unexpected EOF - [SOLVED] #946