scanner-research / scanner

Efficient video analysis at scale
https://scanner-research.github.io/
Apache License 2.0
615 stars 108 forks source link

Walkthrough fails on CPU Docker instance #270

Closed spaulaus closed 5 years ago

spaulaus commented 5 years ago

Background

I'm attempting to run the walkthrough on a Docker CPU instance. I haven't made any changes or updates to the image. I've tried providing the input file as a positional argument, and received the same result.

Steps to reproduce:

  1. wget https://raw.githubusercontent.com/scanner-research/scanner/master/docker/docker-compose.yml
  2. docker-compose run --service-ports cpu /bin/bash
  3. cd /opt/scanner/examples/apps/walkthroughs
  4. wget https://storage.googleapis.com/scanner-data/public/sample-clip.mp4
  5. python3 grayscale_conversion.py

The error

root@abe0ef57dfa3:/opt/scanner/examples/apps/walkthroughs# python3 grayscale_conversion.py 
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '(null)':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.71.100
  Duration: 00:01:00.02, start: 0.000000, bitrate: 2037 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1902 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
1 worker failed at 10:35AM +0000 on Apr 30, 2019                                                             
  0%|                                                      | 0/3 [01:37<?, ?it/s, jobs=2, tasks=3, workers=0]E0430 10:36:46.240423  1134 master.cpp:1739] No workers but have unfinished work after 30 seconds
100%|██████████████████████████████████████████████| 3/3 [01:38<00:00, 32.95s/it, jobs=2, tasks=3, workers=0]
Traceback (most recent call last):
  File "grayscale_conversion.py", line 37, in <module>
    main()
  File "grayscale_conversion.py", line 27, in main
    sc.run(output, sp.PerfParams.manual(50, 250), cache_mode=sp.CacheMode.Overwrite)
  File "/usr/local/lib/python3.5/dist-packages/scannerpy/client.py", line 1586, in run
    raise ScannerException(job_status.result.msg)
scannerpy.common.ScannerException: No workers but have unfinished work after 30 seconds
E0430 10:45:08.041167  1189 worker.cpp:662] Worker did not receive heartbeat in 300000ms. Shutting down.

Additional Information

I ran ctest from the build directory at /opt/scanner/build and received an error on the python tests. Running pytest in the tests directory shows a number of failing tests.

Test Results:

============================================ test session starts ============================================ platform linux -- Python 3.5.2, pytest-3.0.6, py-1.8.0, pluggy-0.4.0 rootdir: /opt/scanner/tests, inifile: pytest.ini collected 37 items py_test.py s.FF..FFF.FFFFFFFFFFsFFFFFFFFFFF.FFFAborted (core dumped)

willcrichton commented 5 years ago

Thanks for the error report. I just double checked and this all worked for me, so I'm guessing it's either your machine (maybe not enough memory or something?) or an environment setup issue.

First, to double-check, run docker-compose pull cpu to make sure you have the latest image.

Next, can you run the tests with pytest tests -vvs and paste the output of logs to a gist?

spaulaus commented 5 years ago

Yep! I'll do that first thing tomorrow. Thanks! The host machine should have plenty of resources (64 GB RAM and 12 cores). I confirmed that the container was set to use "unlimited".

spaulaus commented 5 years ago

@willcrichton : I've uploaded the results to a gist. I ensured that the docker image was updated. Here's the image that the container built from :

scannerresearch/scannertools:cpu-latest@sha256:6ff1e13a3e2a3877ab3543eff07c71c97d61a335a1901816113e68e227efc416
willcrichton commented 5 years ago

Thanks @spaulaus. Does this machine have any kind of proxy or firewall?

spaulaus commented 5 years ago

@willcrichton: It is. I verified that I'm able to connect through the proxy. I'm able to execute wget and git pulls without issues. I'll verify that there's not a certificate issue.

willcrichton commented 5 years ago

We use gRPC to connect the various processes, which we've seen issues with in the past around proxies. If I remember correctly, try unset http_proxy and re-running the example.

spaulaus commented 5 years ago

@willcrichton : I confirmed that clearing the proxy environment variables results in a successful conversion to greyscale. It's had the side-effect of causing all the unittests to fail. This is outside the scope of the original problem statement.

Thanks for your assistance!