webrecorder / browsertrix-old

Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System
Apache License 2.0
88 stars 7 forks source link

Using a remote shepherd? #39

Closed jswrenn closed 4 years ago

jswrenn commented 4 years ago

I'm running browsertrix on a VPS and have confirmed I can access http://XXX.XXX.XXX.XXX:8000 and http://XXX.XXX.XXX.XXX:9020 from my laptop's web browser. I can launch scrapes from the web frontend without issue, and I can list crawls using the CLI from my laptop without issue.

From my laptop, I then attempt to create a new browser profile:

browsertrix \
  --server http://XXX.XXX.XXX.XXX:8000 \
  --shepherd http://XXX.XXX.XXX.XXX:9020 \
  profile create

...but that produces this error:

Traceback (most recent call last):
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 1262, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 1308, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 1257, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 974, in send
    self.connect()
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/urllib3/util/retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 1262, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 1308, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 1257, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/http/client.py", line 974, in send
    self.connect()
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/api/client.py", line 207, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/api/client.py", line 230, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/john/.pyenv/versions/3.6.10/bin/browsertrix", line 11, in <module>
    load_entry_point('browsertrix-cli==0.1.0.dev0', 'console_scripts', 'browsertrix')()
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/click-7.1.2-py3.6.egg/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/click-7.1.2-py3.6.egg/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/click-7.1.2-py3.6.egg/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/click-7.1.2-py3.6.egg/click/core.py", line 1256, in invoke
    Command.invoke(self, ctx)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/click-7.1.2-py3.6.egg/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/click-7.1.2-py3.6.egg/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/browsertrix_cli-0.1.0.dev0-py3.6.egg/browsertrix_cli/profile.py", line 42, in profile
    docker_api = docker.from_env(version='auto')
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/client.py", line 85, in from_env
    timeout=timeout, version=version, **kwargs_from_env(**kwargs)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/client.py", line 40, in __init__
    self.api = APIClient(*args, **kwargs)
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/api/client.py", line 190, in __init__
    self._version = self._retrieve_server_version()
  File "/home/john/.pyenv/versions/3.6.10/lib/python3.6/site-packages/docker-4.2.1-py3.6.egg/docker/api/client.py", line 215, in _retrieve_server_version
    'Error while fetching server API version: {0}'.format(e)
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

The 'No such file or directory' message stands out to me; is it trying to connect via a unix socket?

Are there additional steps I need to take to run and control browsertrix remotely?

jswrenn commented 4 years ago

To fix this error: First, enable the remote API for dockerd. Then, set the DOCKER_HOST environment variable before using the browsertrix cli; e.g.:

DOCKER_HOST=tcp://XXX.XXX.XXX.XXX:2376 browsertrix \
  --server http://XXX.XXX.XXX.XXX:8000 \
  --shepherd http://XXX.XXX.XXX.XXX:9020 \
  profile create