openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

Scrapers can't connect to the internet with mitmproxy enabled #1135

Closed henare closed 6 years ago

henare commented 7 years ago

Scrapers can't connect to the internet when mitmproxy is enabled. I think it's because the newish dedicated morph network (192.168.0.0/16) conflicts with the virtualbox bridge network (192.168.11.0/24).

henare commented 7 years ago

Sigh. I'm seeing it on production now too. I'm guessing it's the updates we did earlier.

henare commented 7 years ago

Here's the docker upgrade:

docker-engine:amd64 (1.13.0-0~ubuntu-precise, 17.03.0~ce-0~ubuntu-precise)

and the full upgrade log for debugging:

Upgrade: libc-bin:amd64 (2.15-0ubuntu10.15, 2.15-0ubuntu10.16), bind9-host:amd64 (9.8.1.dfsg.P1-4ubuntu0.20, 9.8.1.dfsg.P1-4ubuntu0.21), libicu48:amd64 (4.8.1.1-3ubuntu0.6, 4.8.1.1-3ubuntu0.7), percona-xtrabackup:amd64 (2.3.5-1.precise, 2.3.7-2.precise), dnsutils:amd64 (9.8.1.dfsg.P1-4ubuntu0.20, 9.8.1.dfsg.P1-4ubuntu0.21), openjdk-7-jdk:amd64 (7u121-2.6.8-1ubuntu0.12.04.1, 7u121-2.6.8-1ubuntu0.12.04.3), openjdk-7-jre:amd64 (7u121-2.6.8-1ubuntu0.12.04.1, 7u121-2.6.8-1ubuntu0.12.04.3), passenger-dev:amd64 (5.1.1-1~precise1, 5.1.2-1~precise1), passenger-doc:amd64 (5.1.1-1~precise1, 5.1.2-1~precise1), docker-engine:amd64 (1.13.0-0~ubuntu-precise, 17.03.0~ce-0~ubuntu-precise), libdns81:amd64 (9.8.1.dfsg.P1-4ubuntu0.20, 9.8.1.dfsg.P1-4ubuntu0.21), libgnutls26:amd64 (2.12.14-5ubuntu3.12, 2.12.14-5ubuntu3.14), libfreetype6:amd64 (2.4.8-1ubuntu2.3, 2.4.8-1ubuntu2.4), libisccc80:amd64 (9.8.1.dfsg.P1-4ubuntu0.20, 9.8.1.dfsg.P1-4ubuntu0.21), tcpdump:amd64 (4.2.1-1ubuntu2.2, 4.9.0-1ubuntu1~ubuntu12.04.1), liblwres80:amd64 (9.8.1.dfsg.P1-4ubuntu0.20, 9.8.1.dfsg.P1-4ubuntu0.21), nginx-common:amd64 (1.10.2-8.5.1.1~precise1, 1.10.2-8.5.1.2~precise1), multiarch-support:amd64 (2.15-0ubuntu10.15, 2.15-0ubuntu10.16), passenger:amd64 (5.1.1-1~precise1, 5.1.2-1~precise1), libssl-dev:amd64 (1.0.1-4ubuntu5.38, 1.0.1-4ubuntu5.39), libssl-doc:amd64 (1.0.1-4ubuntu5.38, 1.0.1-4ubuntu5.39), w3m:amd64 (0.5.3-5ubuntu1.1, 0.5.3-5ubuntu1.2), libxml2:amd64 (2.7.8.dfsg-5.1ubuntu4.15, 2.7.8.dfsg-5.1ubuntu4.17), libbind9-80:amd64 (9.8.1.dfsg.P1-4ubuntu0.20, 9.8.1.dfsg.P1-4ubuntu0.21), nginx-extras:amd64 (1.10.2-8.5.1.1~precise1, 1.10.2-8.5.1.2~precise1), libxml2-dev:amd64 (2.7.8.dfsg-5.1ubuntu4.15, 2.7.8.dfsg-5.1ubuntu4.17), libxpm4:amd64 (3.5.9-4, 3.5.9-4ubuntu0.1), linux-image-generic-lts-trusty:amd64 (3.13.0.107.98, 3.13.0.113.104), linux-tools-common:amd64 (3.2.0-120.163, 3.2.0-124.167), libisccfg82:amd64 (9.8.1.dfsg.P1-4ubuntu0.20, 9.8.1.dfsg.P1-4ubuntu0.21), libc6-dev:amd64 (2.15-0ubuntu10.15, 2.15-0ubuntu10.16), linux-headers-generic-lts-trusty:amd64 (3.13.0.107.98, 3.13.0.113.104), libevent-2.0-5:amd64 (2.0.16-stable-1ubuntu0.1, 2.0.16-stable-1ubuntu0.2), openssl:amd64 (1.0.1-4ubuntu5.38, 1.0.1-4ubuntu5.39), libgc1c2:amd64 (7.1-8ubuntu0.12.04.1, 7.1-8ubuntu0.12.04.3), linux-libc-dev:amd64 (3.2.0-120.163, 3.2.0-124.167), libc-dev-bin:amd64 (2.15-0ubuntu10.15, 2.15-0ubuntu10.16), libisc83:amd64 (9.8.1.dfsg.P1-4ubuntu0.20, 9.8.1.dfsg.P1-4ubuntu0.21), libc6:amd64 (2.15-0ubuntu10.15, 2.15-0ubuntu10.16), openjdk-7-jre-headless:amd64 (7u121-2.6.8-1ubuntu0.12.04.1, 7u121-2.6.8-1ubuntu0.12.04.3), libssl1.0.0:amd64 (1.0.1-4ubuntu5.38, 1.0.1-4ubuntu5.39), libgd2-noxpm:amd64 (2.0.36~rc1~dfsg-6ubuntu2.3, 2.0.36~rc1~dfsg-6ubuntu2.4)
auxesis commented 7 years ago

@henare is mitmproxy meant to be running on the production Morph box?

We just did another nuke from orbit per #1104, and restarted mitmproxy as part of the run sheet. Hope that doesn't muck anything up.

henare commented 7 years ago

@henare is mitmproxy meant to be running on the production Morph box?

@auxesis Yes, it's what intercepts the scraper network traffic to record what sites they've scraped.

We just did another nuke from orbit per #1104, and restarted mitmproxy as part of the run sheet. Hope that doesn't muck anything up.

It's disabled by removing the iptables rules with: https://github.com/openaustralia/morph/blob/master/provisioning/roles/morph-app/files/iptables-morph-remove

So it didn't muck anything up restarting the mitmproxy container. However those rules are added on boot so a restart of the server will muck things up. We'll need to manually run that command after a restart until this issue is fixed.

auxesis commented 7 years ago

We'll need to manually run that command after a restart until this issue is fixed.

How about we throw the execution of that script into /etc/rc.local, so we don't have to run it?

henare commented 7 years ago

We deliberately add those rules on boot. Shouldn't we just fix the problem?

henare commented 7 years ago

After doing routing package updates just now I noticed that after Docker had updated and restarted that the mitmproxy container was just restarting over and over. Attaching to the container showed this backtrace:

Traceback (most recent call last):
  File "/usr/local/bin/mitmdump", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3036, in <module>
    @_call_aside
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3020, in _call_aside
    f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3049, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 654, in _build_master
    ws.require(__requires__)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 968, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 854, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pyasn1>0.1.2' distribution was not found and is required by mitmproxy

I've got no idea why the container is suddenly broken. I even tried downloading it again but get the same result. I've killed the container for now so it doesn't keep rebooting over and over.

equivalentideas commented 7 years ago

I rebooted the service, but forgot about this stuff.

Scrapers were getting connection errors when trying to connect to sites.

I then ran:

root@li421-88:~# bash /var/www/current/provisioning/roles/morph-app/files/iptables-morph-remove

And now all seems well in the land of morph 🌄

henare commented 7 years ago

When this is fixed we should revert #1148 so that the statistic about pages scraped is shown on the home page once again.

mlandauer commented 6 years ago

Fixed by f9970b0e5cc0b238d9d1758f9846e854d5777600 and a709e5e3db38dccaf454452bb693bef14a210e5d.

I'm about to reinstate mitmproxy in production now.

mlandauer commented 6 years ago

When that's up and running I'll revert #1148