Open BenWu opened 3 years ago
It would be good to just retry the requests after a timeout instead of failing immediately.
Another one
[2020-11-20 20:08:58,939] {pod_launcher.py:156} INFO - b'Traceback (most recent call last):\n'
[2020-11-20 20:08:58,941] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 436, in _error_catcher\n'
[2020-11-20 20:08:58,954] {pod_launcher.py:156} INFO - b' yield\n'
[2020-11-20 20:08:58,954] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 518, in read\n'
[2020-11-20 20:08:58,959] {pod_launcher.py:156} INFO - b' data = self._fp.read(amt) if not fp_closed else b""\n'
[2020-11-20 20:08:58,959] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/http/client.py", line 458, in read\n'
[2020-11-20 20:08:59,006] {pod_launcher.py:156} INFO - b' n = self.readinto(b)\n'
[2020-11-20 20:08:59,007] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/http/client.py", line 502, in readinto\n'
[2020-11-20 20:08:59,008] {pod_launcher.py:156} INFO - b' n = self.fp.readinto(b)\n'
[2020-11-20 20:08:59,010] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/socket.py", line 704, in readinto\n'
[2020-11-20 20:08:59,044] {pod_launcher.py:156} INFO - b' return self._sock.recv_into(b)\n'
[2020-11-20 20:08:59,045] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/ssl.py", line 1241, in recv_into\n'
[2020-11-20 20:08:59,074] {pod_launcher.py:156} INFO - b' return self.read(nbytes, buffer)\n'
[2020-11-20 20:08:59,074] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/ssl.py", line 1099, in read\n'
[2020-11-20 20:08:59,078] {pod_launcher.py:156} INFO - b' return self._sslobj.read(len, buffer)\n'
[2020-11-20 20:08:59,078] {pod_launcher.py:156} INFO - b'ConnectionResetError: [Errno 104] Connection reset by peer\n'
[2020-11-20 20:08:59,078] {pod_launcher.py:156} INFO - b'\n'
[2020-11-20 20:08:59,078] {pod_launcher.py:156} INFO - b'During handling of the above exception, another exception occurred:\n'
[2020-11-20 20:08:59,079] {pod_launcher.py:156} INFO - b'\n'
[2020-11-20 20:08:59,079] {pod_launcher.py:156} INFO - b'Traceback (most recent call last):\n'
[2020-11-20 20:08:59,079] {pod_launcher.py:156} INFO - b' File "/usr/local/bin/leanplum-data-export", line 33, in <module>\n'
[2020-11-20 20:08:59,079] {pod_launcher.py:156} INFO - b" sys.exit(load_entry_point('leanplum-data-export', 'console_scripts', 'leanplum-data-export')())\n"
[2020-11-20 20:08:59,079] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__\n'
[2020-11-20 20:08:59,111] {pod_launcher.py:156} INFO - b' return self.main(*args, **kwargs)\n'
[2020-11-20 20:08:59,112] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main\n'
[2020-11-20 20:08:59,117] {pod_launcher.py:156} INFO - b' rv = self.invoke(ctx)\n'
[2020-11-20 20:08:59,117] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke\n'
[2020-11-20 20:08:59,117] {pod_launcher.py:156} INFO - b' return _process_result(sub_ctx.command.invoke(sub_ctx))\n'
[2020-11-20 20:08:59,117] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke\n'
[2020-11-20 20:08:59,118] {pod_launcher.py:156} INFO - b' return ctx.invoke(self.callback, **ctx.params)\n'
[2020-11-20 20:08:59,118] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke\n'
[2020-11-20 20:08:59,119] {pod_launcher.py:156} INFO - b' return callback(*args, **kwargs)\n'
[2020-11-20 20:08:59,119] {pod_launcher.py:156} INFO - b' File "/app/leanplum_data_export/leanplum_data_export/__main__.py", line 31, in export_leanplum\n'
[2020-11-20 20:08:59,119] {pod_launcher.py:156} INFO - b' exporter.export(date, s3_bucket, bucket, prefix, bq_dataset, table_prefix, version, clean)\n'
[2020-11-20 20:08:59,119] {pod_launcher.py:156} INFO - b' File "/app/leanplum_data_export/leanplum_data_export/export.py", line 53, in export\n'
[2020-11-20 20:08:59,120] {pod_launcher.py:156} INFO - b' csv_file_paths = self.transform_data_file(key, schemas, data_dir, s3_bucket)\n'
[2020-11-20 20:08:59,120] {pod_launcher.py:156} INFO - b' File "/app/leanplum_data_export/leanplum_data_export/export.py", line 164, in transform_data_file\n'
[2020-11-20 20:08:59,120] {pod_launcher.py:156} INFO - b' self.s3_client.download_file(bucket, data_file_key, os.path.join(data_dir, "data.ndjson"))\n'
[2020-11-20 20:08:59,120] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/boto3/s3/inject.py", line 170, in download_file\n'
[2020-11-20 20:08:59,136] {pod_launcher.py:156} INFO - b' return transfer.download_file(\n'
[2020-11-20 20:08:59,136] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/boto3/s3/transfer.py", line 307, in download_file\n'
[2020-11-20 20:08:59,140] {pod_launcher.py:156} INFO - b' future.result()\n'
[2020-11-20 20:08:59,140] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/s3transfer/futures.py", line 106, in result\n'
[2020-11-20 20:08:59,178] {pod_launcher.py:156} INFO - b' return self._coordinator.result()\n'
[2020-11-20 20:08:59,179] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/s3transfer/futures.py", line 265, in result\n'
[2020-11-20 20:08:59,179] {pod_launcher.py:156} INFO - b' raise self._exception\n'
[2020-11-20 20:08:59,180] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/s3transfer/tasks.py", line 126, in __call__\n'
[2020-11-20 20:08:59,195] {pod_launcher.py:156} INFO - b' return self._execute_main(kwargs)\n'
[2020-11-20 20:08:59,196] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/s3transfer/tasks.py", line 150, in _execute_main\n'
[2020-11-20 20:08:59,196] {pod_launcher.py:156} INFO - b' return_value = self._main(**kwargs)\n'
[2020-11-20 20:08:59,196] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/s3transfer/download.py", line 521, in _main\n'
[2020-11-20 20:08:59,214] {pod_launcher.py:156} INFO - b' for chunk in chunks:\n'
[2020-11-20 20:08:59,214] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/s3transfer/download.py", line 649, in __next__\n'
[2020-11-20 20:08:59,218] {pod_launcher.py:156} INFO - b' chunk = self._body.read(self._chunksize)\n'
[2020-11-20 20:08:59,218] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/s3transfer/utils.py", line 551, in read\n'
[2020-11-20 20:08:59,219] {pod_launcher.py:156} INFO - b' value = self._stream.read(*args, **kwargs)\n'
[2020-11-20 20:08:59,219] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/botocore/response.py", line 77, in read\n'
[2020-11-20 20:08:59,238] {pod_launcher.py:156} INFO - b' chunk = self._raw_stream.read(amt)\n'
[2020-11-20 20:08:59,238] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 540, in read\n'
[2020-11-20 20:08:59,238] {pod_launcher.py:156} INFO - b' raise IncompleteRead(self._fp_bytes_read, self.length_remaining)\n'
[2020-11-20 20:08:59,238] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/contextlib.py", line 135, in __exit__\n'
[2020-11-20 20:08:59,257] {pod_launcher.py:156} INFO - b' self.gen.throw(type, value, traceback)\n'
[2020-11-20 20:08:59,258] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 454, in _error_catcher\n'
[2020-11-20 20:08:59,258] {pod_launcher.py:156} INFO - b' raise ProtocolError("Connection broken: %r" % e, e)\n'
[2020-11-20 20:08:59,259] {pod_launcher.py:156} INFO - b'urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(104, \'Connection reset by peer\')", ConnectionResetError(104, \'Connection reset by peer\'))\n'
There are still errors happening frequently, recently it's almost everyday. This is the log for one error but there may be different ones. Airflow logs get purged too quickly so they're hard to catch.