sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud
https://CoCalc.com
Other
1.16k stars 211 forks source link

implement an "Open in CoCalc" badge/link #4525

Closed williamstein closed 3 weeks ago

williamstein commented 4 years ago

Here's what it should basically do, but in some standard way:

Yes, though we haven't really advertised it as such yet.

  1. Create some content that is shared on https://share.cocalc.com/share/
  2. Open it and you'll see a big green button labeled "Open in CoCalc with one click!"
  3. Right click on the button and click copy link address, and get something like
    https://share.cocalc.com/app?anonymous=true&launch=share/aa502d8f25af3b6026cc1f90e5645fc3372ee0e7/Classical.sagews
  4. That link is very similar to the link Colab provides, but for cocalc. It's actually technically even better, since the user does NOT have to have a cocalc account at all, whereas with colab you must be signed in with a google account.

We have a little badge:
image

Internal link

nathancarter commented 4 years ago

For others who drop by here, this post was partially in response to an email I sent to help@cocalc.com. Thanks for the super-fast reply, William!

This answers a lot of my questions, but not 100%. For others who came here, the original question I asked was this:

I noticed today that Google Colab has a feature called “Open in Colab” that lets any Jupyter Notebook on GitHub be converted easily into a link for opening that notebook on Colab and running it. This works for only Python 2 and 3 notebooks, of course, since that’s what Colab supports. See here for details. Is there any similar feature for CoCalc?

It seems from what's said above that if I have a notebook on GitHub and want to launch it in CoCalc, my workflow should be:

  1. Transfer the content to share.cocalc.com/share. (I don't know for sure, but I think that is a manual process that involves creating a CoCalc project, uploading the file from GitHub, and then making that file public.)
  2. Find the file on share.cocalc.com/share and copy the link from there, which as far as I know is also a manual process.

I'm interested in a more automated process, if there is one. The process I described above for Colab is literally string replacement, into a URL of the form https://colab.research.google.com/github/USER/REPO/blob/master/PATH/FILENAME.ipynb. That is, Colab does the GitHub extraction for you as part of visiting that URL.

I ask because I'm creating a reference website with lots of code samples and it would be great to make code samples launchable as part of the website's build process. Right now I could do that with Colab but only for Python 2 and 3. If CoCalc had a similar functionality, I could support all the languages that CoCalc supports.

Thank you!

huonw commented 4 years ago

We are interested in a similar external-link process. For instance, we have dozens of demo jupyter notebooks, like https://github.com/stellargraph/stellargraph/blob/a310143ff/demos/node-classification/gcn/gcn-cora-node-classification-example.ipynb, and each of them has a pair of links at the top (and the bottom), to automatically open and execute the notebook from GitHub on Colab and also on Binder.

The source for those links is:

<table>
  <tr>
    <td>Run the master version of this notebook:</td>
    <td>
      <a href="https://mybinder.org/v2/gh/stellargraph/stellargraph/master?urlpath=lab/tree/demos/node-classification/gcn/gcn-cora-node-classification-example.ipynb" alt="Open In Binder" target="_parent"><img src="https://mybinder.org/badge_logo.svg"/></a>
    </td>
    <td>
      <a href="https://colab.research.google.com/github/stellargraph/stellargraph/blob/master/demos/node-classification/gcn/gcn-cora-node-classification-example.ipynb" alt="Open In Colab" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg"/></a>
    </td>
  </tr>
</table>

As an additional constraint: we're working on a library which is required for those notebooks (as well as some other dependencies). We're happy to install when required, but we do need a way to detect this case. For Colab, we do this by checking for the google.colab module; is there currently a reliable way to detect executing-in-CoCalc? (Binder automatically "installs" the library, because it turns the whole repository into a docker image.)

Also, additional prior art close to this space is https://nbviewer.jupyter.org, which accepts an arbitrary URL and renders the notebook, in addition to special cases for GitHub (although rendering is not nearly as intricate as making available for execution, I'm sure).

Thanks for CoCalc, and we're excited for this if/when it is available!

haraldschilly commented 4 years ago

hi @huonw from stellargaph. This looks quite cool. I used it as a test for our custom software image feature. It's not 100% done, but I think you see the potential for your case. Basically it builds a docker image like mybinder does, adds some cocalc specific extras, and packages this up like other projects. You can access it by creating a new project and selecting stellargraph as the underlying software image. It will populate it with your files. Below is a screenshot for this step and then another one where I ran a notebook.

is there currently a reliable way to detect executing-in-CoCalc

I think the best way is to check for environment variables. In particular, I think we'll never change the following one, because it's quite helpful and has a prefix in its name: 'COCALC_PROJECT_ID' in os.environ

additional notes: I wasn't lucky with all demo notebooks. some seemed to run for long time? maybe I got impatient after a minute. Another one – demos/spatio-temporal/gcn-lstm-LA.ipynb – failed in the middle with: TypeError: __init__() got an unexpected keyword argument 'lstm_layer_sizes' … and other ones are too heavy on the memory side.

screenshot-test cocalc com-2020 04 27-11_39_33

2020-04-27-stellargraph-test

huonw commented 4 years ago

This looks quite cool. I used it as a test for our custom software image feature. It's not 100% done, but I think you see the potential for your case. Basically it builds a docker image like mybinder does, adds some cocalc specific extras, and packages this up like other projects. You can access it by creating a new project and selecting stellargraph as the underlying software image. It will populate it with your files.

That's awesome!

That would be perfect for us: it seems closer to the Binder workflow, which we find nicer/smoother than the Colab one. We definitely hit all of the negatives mentioned in the "Compared to the MyBinder service" section, so being able to recommend CoCalc and see those benefits would be great.

It will populate it with your files

It looks like that current environment uses the in-develop version of StellarGraph. It would make more sense for us to use the latest release because the develop version may be somewhat broken. Is that someone we as an external project can contribute/control, easily? (Including updating the environment when we do new releases?)

I think the best way is to check for environment variables. In particular, I think we'll never change the following one, because it's quite helpful and has a prefix in its name: 'COCALC_PROJECT_ID' in os.environ

Makes sense, although if the custom environment approach works it wouldn't be needed 👍

additional notes: I wasn't lucky with all demo notebooks. some seemed to run for long time? maybe I got impatient after a minute. Another one – demos/spatio-temporal/gcn-lstm-LA.ipynb – failed in the middle with: TypeError: init() got an unexpected keyword argument 'lstm_layer_sizes' … and other ones are too heavy on the memory side.

Yeah, some of them take a while, and that code is indeed broken at the moment. Thanks for trying them 😄

haraldschilly commented 4 years ago

thanks for testing!

so, this "custom software image" has a few differences. In particular, your changes to the files are stored persistently, just like with other cocalc projects. They're copied once when you create the project. Later on, there is a button to copy the files again (on top of what you already have in your project) … which picks up changes to the files if there are some. That doc page explains this a little bit.

In any case, it is possible to pin a specific version reference and well, I have to check, but also makes sense to just take the files from the subdirectory with the notebooks and not everything. Right now, I control the configuration and build it the image. There is nothing automated. In particular, we want to avoid that automatic builds fail or there are random delays. Rather, we build, check if the new image works, and then keep it stable. That was helpful for courses as well – i.e. exactly the same software and files throughout the semester.

So, in any case, specific releases are very good :-)

Adding some options to such a "magic link" to end up in a stellargaph project shouldn't be too hard, but we didn't came around to actually do this yet.

AllenDowney commented 4 years ago

I would love this feature.

I am doing this now with Jupyter notebooks that are stored on GitHub and accessible from Colab. The work flow for me is good because I just push changes to GitHub. And it works well for users who can click a link and run the notebooks.

If you are working on this feature, I volunteer to help with testing.

williamstein commented 2 years ago

@AllenDowney

I would love this feature.

I am doing this now with Jupyter notebooks that are stored on GitHub and accessible from Colab. The work flow for me is good because I just push changes to GitHub. And it works well for users who can click a link and run the notebooks.

If you are working on this feature, I volunteer to help with testing.

If you're still around, this is now (PARTLY) implemented in cocalc! Please test. The URL functionality is implemented, same as nbviewer, but there's not a badge quite yet and it only works to edit individual files (not whole git repos).

https://user-images.githubusercontent.com/1276278/177587186-dfb8e8ea-08aa-4fcb-911a-96f0ae93425b.svg

Here's a working badge to browse the CoCalc GitHub repo itself in CoCalc:

which I made via

[<img src="https://user-images.githubusercontent.com/1276278/177587186-dfb8e8ea-08aa-4fcb-911a-96f0ae93425b.svg">](https://cocalc.com/github/sagemathinc/cocalc)
AllenDowney commented 2 years ago

@williamstein Good news! This looks really promising. As an example, I tried running this notebook:

https://cocalc.com/github/AllenDowney/ModSimPy/blob/master/chapters/chap01.ipynb

It loaded quickly and rendered the notebook well. When I pressed Edit, it prompted me to log in. For now I proceeded as an anonymous user.

[To me, there's a small conceptual mismatch here: I think of what I am doing as running the notebook rather than editing it. In this use case, where the notebook comes from GitHub, "edit" suggests that the user is editing my notebook and the changes would go back to GitHub]

Then it prompted me to create a project, which I did.

Then it created a panel where I can edit and run the notebook.

[I was a little surprised that the editor is in a panel -- it's too small to work in and the static view underneath is redundant. But then I went to full screen and that's better ]

Running the first few cells worked as expected. It took about 30 seconds to run the first cell, so I assume it was starting the runtime env.

The second code cell in this notebook downloads a file using the requests package. But that failed with the error below. I assume that's because as an anonymous user I don't have network access? It would be good to get an error message to explain that.

So I logged in and reloaded the notebook. I pressed the "Start Project" button and now that's been spinning for about a minute. Not sure what the problem is.

Thanks for the chance to check this out!

TimeoutError                              Traceback (most recent call last)
/usr/lib/python3.8/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1353             try:
-> 1354                 h.request(req.get_method(), req.selector, req.data, headers,
   1355                           encode_chunked=req.has_header('Transfer-encoding'))
/usr/lib/python3.8/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1255         """Send a complete request to the server."""
-> 1256         self._send_request(method, url, body, headers, encode_chunked)
   1257 
/usr/lib/python3.8/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1301             body = _encode(body, 'body')
-> 1302         self.endheaders(body, encode_chunked=encode_chunked)
   1303 
/usr/lib/python3.8/http/client.py in endheaders(self, message_body, encode_chunked)
   1250             raise CannotSendHeader()
-> 1251         self._send_output(message_body, encode_chunked=encode_chunked)
   1252 
/usr/lib/python3.8/http/client.py in _send_output(self, message_body, encode_chunked)
   1010         del self._buffer[:]
-> 1011         self.send(msg)
   1012 
/usr/lib/python3.8/http/client.py in send(self, data)
    950             if self.auto_open:
--> 951                 self.connect()
    952             else:
/usr/lib/python3.8/http/client.py in connect(self)
   1417 
-> 1418             super().connect()
   1419 
/usr/lib/python3.8/http/client.py in connect(self)
    921         """Connect to the host and port specified in __init__."""
--> 922         self.sock = self._create_connection(
    923             (self.host,self.port), self.timeout, self.source_address)
/usr/lib/python3.8/socket.py in create_connection(address, timeout, source_address)
    807         try:
--> 808             raise err
    809         finally:
/usr/lib/python3.8/socket.py in create_connection(address, timeout, source_address)
    795                 sock.bind(source_address)
--> 796             sock.connect(sa)
    797             # Break explicitly a reference cycle
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:
URLError                                  Traceback (most recent call last)
/tmp/ipykernel_352/767900776.py in <cell line: 12>()
     10         print('Downloaded ' + local)
     11 
---> 12 download('https://github.com/AllenDowney/ModSimPy/raw/master/' +
     13          'modsim.py')
/tmp/ipykernel_352/767900776.py in download(url)
      7     if not exists(filename):
      8         from urllib.request import urlretrieve
----> 9         local, _ = urlretrieve(url, filename)
     10         print('Downloaded ' + local)
     11 
/usr/lib/python3.8/urllib/request.py in urlretrieve(url, filename, reporthook, data)
    245     url_type, path = _splittype(url)
    246 
--> 247     with contextlib.closing(urlopen(url, data)) as fp:
    248         headers = fp.info()
    249 
/usr/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):
/usr/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
    523 
    524         sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 525         response = self._open(req, data)
    526 
    527         # post-process response
/usr/lib/python3.8/urllib/request.py in _open(self, req, data)
    540 
    541         protocol = req.type
--> 542         result = self._call_chain(self.handle_open, protocol, protocol +
    543                                   '_open', req)
    544         if result:
/usr/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    500         for handler in handlers:
    501             func = getattr(handler, meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result
/usr/lib/python3.8/urllib/request.py in https_open(self, req)
   1395 
   1396         def https_open(self, req):
-> 1397             return self.do_open(http.client.HTTPSConnection, req,
   1398                 context=self._context, check_hostname=self._check_hostname)
   1399 
/usr/lib/python3.8/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1355                           encode_chunked=req.has_header('Transfer-encoding'))
   1356             except OSError as err: # timeout error
-> 1357                 raise URLError(err)
   1358             r = h.getresponse()
   1359         except:
URLError: <urlopen error [Errno 110] Connection timed out>
williamstein commented 2 years ago

[To me, there's a small conceptual mismatch here: I think of what I am doing as running the notebook rather than editing it. In this use case, where the notebook comes from GitHub, "edit" suggests that the user is editing my notebook and the changes would go back to GitHub]

Conceptually this will be exactly the same as GitHub's edit button. However, as mentioned above, the implementation in CoCalc is not finished yet. Note that in the screenshot below I'm looking at your repo that I have no write permissions to:

image

Running the first few cells worked as expected. It took about 30 seconds to run the first cell, so I assume it was starting the runtime env.

No, the Docker container is already running. The only thing that is happening is the Python kernel is started and the code is run. If the time is really 30s to run the first cell, that can only mean the filesystem and available CPU when you started the notebook was very slow, which could be the case when using cocalc anonymously (where you have lower priority and are on a heavily loaded node).

The second code cell in this notebook downloads a file using the requests package. But that failed with the error below. I assume that's because as an anonymous user I don't have network access? It would be good to get an error message to explain that.

That's correct. There's a big banner at the top, but it appears to be disabled for anonymous projects (since people tend to ignore it or complain about it.). We'll re-enable it.

So I logged in and reloaded the notebook. I pressed the "Start Project" button and now that's been spinning for about a minute. Not sure what the problem is.

It sounds like you signed in with your existing account then tried to load a project that belongs to that anonymous user? Due to security constraints, you should have no access to that project, and it shouldn't work. Instead, go to https://cocalc.com/github/AllenDowney/ModSimPy/blob/master/chapters/chap01.ipynb, then click "Edit" again to copy that file to one of the projects you have access to.

williamstein commented 3 weeks ago

My impression in 2024 is that this sort of functionality is basically a way to attract abuse, spam, and security problems and it doesn't help with good growth at all.