pangeo-data / pangeo-tutorial-gallery

Repo to house pangeo-tutorial notebooks for pangeo-gallery
MIT License
10 stars 13 forks source link

Notebook not compiling #2

Closed salvis2 closed 4 years ago

salvis2 commented 4 years ago

The notebook is currently not compiling. The current build logs are here, with an error of

/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/binderbot/cli.py:14: DeprecationWarning: "@coroutine" decorator is deprecated since Python 3.8, use "async def" instead
  f = asyncio.coroutine(f)
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/binderbot/cli.py", line 95, in <module>
    sys.exit(main())  # pragma: no cover
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/binderbot/cli.py", line 17, in wrapper
    return loop.run_until_complete(f(*args, **kwargs))
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete

This is different from the last time I tried to run this build, which yielded an error implying the notebook had tried to access an API too many times and exceeded the rate limit. Unsure why the errors are different. I will try to push minor changes to have different build logs to examine, which should help identify the problem.

salvis2 commented 4 years ago

I think someone ran the Action again last night, and this is the visible error stack trace now.

2020-07-02 01:58.10 Code Execute: Error            action=code-execute phase=error traceback=---------------------------------------------------------------------------
CellExecutionError                        Traceback (most recent call last)
<ipython-input-4-1449ab302133> in <module>
      7 with open("xarray.ipynb") as f:
      8     nb = nbformat.read(f, as_version=4)
----> 9 ep.preprocess(nb, dict())
     10 print("OK")
     11 print("Saving xarray.ipynb")

/srv/conda/envs/notebook/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py in preprocess(self, nb, resources, km)
    403         with self.setup_preprocessor(nb, resources, km=km):
    404             self.log.info("Executing notebook with kernel: %s" % self.kernel_name)
--> 405             nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
    406             info_msg = self._wait_for_reply(self.kc.kernel_info())
    407             nb.metadata['language_info'] = info_msg['content']['language_info']
/srv/conda/envs/notebook/lib/python3.7/site-packages/nbconvert/preprocessors/base.py in preprocess(self, nb, resources)
     67         """
     68         for index, cell in enumerate(nb.cells):
---> 69             nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
     70         return nb, resources
     71 

/srv/conda/envs/notebook/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py in preprocess_cell(self, cell, resources, cell_index, store_history)
    446             for out in cell.outputs:
    447                 if out.output_type == 'error':
--> 448                     raise CellExecutionError.from_cell_and_msg(cell, out)
    449             if (reply is not None) and reply['content']['status'] == 'error':
    450                 raise CellExecutionError.from_cell_and_msg(cell, reply['content'])

CellExecutionError: An error occurred while executing the following cell:
------------------
download_file()
------------------

---------------------------------------------------------------------------
ConnectionRefusedError                    Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/connection.py in _new_conn(self)
    159             conn = connection.create_connection(
--> 160                 (self._dns_host, self.port), self.timeout, **extra_kw
    161             )
/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     83     if err is not None:
---> 84         raise err
     85 

/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     73                 sock.bind(source_address)
---> 74             sock.connect(sa)
     75             return sock

ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    676                 headers=headers,
--> 677                 chunked=chunked,
    678             )
/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    391         else:
--> 392             conn.request(method, url, **httplib_request_kw)
    393 

/srv/conda/envs/notebook/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1251         """Send a complete request to the server."""
-> 1252         self._send_request(method, url, body, headers, encode_chunked)
   1253 

/srv/conda/envs/notebook/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1297             body = _encode(body, 'body')
-> 1298         self.endheaders(body, encode_chunked=encode_chunked)
   1299 

/srv/conda/envs/notebook/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
   1246             raise CannotSendHeader()
-> 1247         self._send_output(message_body, encode_chunked=encode_chunked)
   1248 
/srv/conda/envs/notebook/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
   1025         del self._buffer[:]
-> 1026         self.send(msg)
   1027 

/srv/conda/envs/notebook/lib/python3.7/http/client.py in send(self, data)
    965             if self.auto_open:
--> 966                 self.connect()
    967             else:

/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/connection.py in connect(self)
    186     def connect(self):
--> 187         conn = self._new_conn()
    188         self._prepare_conn(conn)

/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/connection.py in _new_conn(self)
    171             raise NewConnectionError(
--> 172                 self, "Failed to establish a new connection: %s" % e
    173             )

NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f1f064350d0>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:
MaxRetryError                             Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    448                     retries=self.max_retries,
--> 449                     timeout=timeout
    450                 )

/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    724             retries = retries.increment(
--> 725                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    726             )

/srv/conda/envs/notebook/lib/python3.7/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    438         if new_retry.is_exhausted():
--> 439             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    440 

MaxRetryError: HTTPConnectionPool(host='ldeo.columbia.edu', port=80): Max retries exceeded with url: /~rpa/sst.tar.gz (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1f064350d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
salvis2 commented 4 years ago

I think the main part is the MaxRetryError at the end:

Max retries exceeded with url: /~rpa/sst.tar.gz (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1f064350d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

I've seen this error before, usually I just wait a while and then try the Action again. I should look for a small change I can make to trigger a rebuild and make a new build log.

salvis2 commented 4 years ago

Updated the notebook titles in https://github.com/pangeo-data/pangeo-tutorial-gallery/commit/d06f74210d91299dc15f7041b1f627f0f4d9858c . Rebuild logs show the same MaxRetryError as above.

salvis2 commented 4 years ago

@rabernat @jhamman do you have any ideas on this MaxRetryError? This is happening in the xarray notebook in the download_file() call that was adapted from get_sst_data() from the original notebook in pangeo-tutorial:

def download_file(data_dir=None):
    # Make data directory
    if data_dir is None:
        data_dir = os.path.join(os.path.expanduser('~'), '.xarray_tutorial_data')

    os.makedirs(data_dir, exist_ok=True)
    cwd = os.getcwd()
    os.chdir(data_dir)

    # Data to download
    url = 'http://ldeo.columbia.edu/~rpa/sst.tar.gz'
    local_filename = url.split('/')[-1]

    # Download the data
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    # un tar/zip the file
    try:
        with tarfile.open(local_filename, "r:gz") as file:
            file.extractall()
    finally:
        os.chdir(cwd)

    # remove tar.gz file
    os.remove(data_dir + '/' + local_filename)

    print(f'\nsst data is in {data_dir}/sst')

The code is trying to get some data from http://ldeo.columbia.edu/~rpa/sst.tar.gz.

scottyhq commented 4 years ago

@salvis2 I know it's not best practice to keep data in the github repo, but in this case it looks like those files are just 43 Mb, so it might be easiest to just create a data/ folder in this repo and modify the notebook to not pull this from the columbia server.

salvis2 commented 4 years ago

@salvis2 I know it's not best practice to keep data in the github repo, but in this case it looks like those files are just 43 Mb, so it might be easiest to just create a data/ folder in this repo and modify the notebook to not pull this from the columbia server.

So the tricky thing here is that binderbot doesn't currently port any non-notebook files over, see https://github.com/pangeo-gallery/pangeo-gallery/issues/22#issuecomment-642024262 . A 20-line file is easy enough to throw in a notebook, but a 43 Mb file might be really ugly to throw in, even if you hide the cell it sits in. I guess I could try that.

I'm tempted to see if I can just add that functionality to binderbot myself, since it really should just get the entire repo into the binderhub to mimic common workflows / repo setups.

salvis2 commented 4 years ago

The same error is present as was present in #3 . On the line

ds_all = xr.open_mfdataset('/home/jovyan/.xarray_tutorial_data/sst/*nc', combine='by_coords')

we get the error

OSError: no files to open
scottyhq commented 4 years ago

looks like i only changed the path here ds = xr.open_dataset('./tutorial-data/sst/NOAA_NCDC_ERSST_v3b_SST-1960.nc', should be easy to just add a path variable to use for opening the sst data: sstDir = './tutorial-data/sst

salvis2 commented 4 years ago

I see. So it does look like the git clone worked for the first xr.open_dataset() call!

I've updated this in 758035743a380a74355a7dda537633541f83f407 .

salvis2 commented 4 years ago

After correctly putting that path string in, looks like the notebook built! https://github.com/pangeo-data/pangeo-tutorial-gallery/runs/855379907?check_suite_focus=true

salvis2 commented 4 years ago

Thanks for your help @scottyhq !