takluyver / nbopen

Open a Jupyter notebook in the best available server
BSD 3-Clause "New" or "Revised" License
304 stars 58 forks source link

404 : Not Found error on CloudStorage drives #81

Open charlesweir opened 1 year ago

charlesweir commented 1 year ago

The latest versions of MacOs (I'm using 12.6.2 Monterey) keep all their cloud storage files cached in ~/Library/CloudStorage.

It appears that nbopen cannot open ipynb files in them. It's not a Jupyter problem: jupyter notebook notebookName.ipynb works fine in the appropriate directory.

But clicking directly on the notebookName.ipynb file brings up a browser window showing:

Screenshot 2023-04-17 at 16 31 32

I get the same results with both Microsoft OneDrive and Google Drive. The path in the browser looks correct, but the notebook doesn't seem to have been started.

Any suggestions?

charlesweir commented 1 year ago

As a workaround, it seems if there's already a notebook active for the particular type of cloud storage (OneDrive/Google Drive) nbopen does work. So if we open dummy notebooks for each storage type before using nbopen, we're OK.

takluyver commented 1 year ago

You mentioned that the path in the browser 'looks correct' - is it exactly the same between when you get a working notebook and when you get the 404?

The bit where it works if you've already opened a notebook nearby suggests that it depends on what notebook servers are already running. The logic nbopen uses is to try to find the 'closest' running notebook server - the one with the longest directory path which is a prefix of the file path you want:

https://github.com/takluyver/nbopen/blob/a7ea2de88a3005d0ec3a0d29b501336d5fdd1914/nbopen/nbopen.py#L13-L16

And if none is found, it will launch a new server, usually with your home directory as the notebook path:

https://github.com/takluyver/nbopen/blob/a7ea2de88a3005d0ec3a0d29b501336d5fdd1914/nbopen/nbopen.py#L36-L39

I'm not a Mac user, so I've no idea in what ways these CloudStorage folders are special.

charlesweir commented 1 year ago

Wow @takluyver, thanks for responding so quickly!

nbopen seems to use absolute paths; it decodes 'softlinks' before operating on files. So the following are all absolute paths.

If I invoke it on ~/Library/CloudStorage/OneDrive-LancasterUniversity/slam.ipynb, I get the 'not found' error for path: http://localhost:8888/notebooks/Library/CloudStorage/OneDrive-LancasterUniversity/slam.ipynb

After running (cd ~/Library/CloudStorage/OneDrive-LancasterUniversity/; jupyter notebook --no-browser) &

If in invoke nbopen on the same file, I see: http://localhost:8889/notebooks/slam.ipynb

Similarly on a folder below it, the following works: http://localhost:8889/notebooks/a/b/c/test.ipynb

But interestingly, the OneDrive 'SharedLibrary' folder is treated as a different drive. So even with the notebook above running I get 'not found' with the following file ~/Library/CloudStorage/OneDrive-SharedLibraries-LancasterUniversity/testOD.ipynb, with the browser path: http://localhost:8888/notebooks/Library/CloudStorage/OneDrive-SharedLibraries-LancasterUniversity/testOD.ipynb

I see also that I wasn't completely right about Google Drive. GD has a HARD link in my home directory (~/Google Drive is identical to /Users/charles/Library/CloudStorage/GoogleDrive-cafaweir@gmail.com/My Drive/ ). If I use the latter path I get the same problem as above. If I use ~/Google Drive/test.ipynb it works fine.

Guess I could just make the OneDrive softlinks in my home dir into hard links. But I expect then my search engine will index everything twice???

takluyver commented 1 year ago

nbopen seems to use absolute paths; it decodes 'softlinks' before operating on files.

It does work with absolute paths - it calls abspath() - but this doesn't automatically resolve symlinks (you'd need realpath or readlink() for that). It's possible that MacOS resolves symlinks when you click to open a file, though.

even with the notebook above running I get 'not found' with the following file ~/Library/CloudStorage/OneDrive-SharedLibraries-LancasterUniversity/testOD.ipynb

It wouldn't use the server you started above for that path, regardless of anything special going on, because it's not under the OneDrive-LancasterUniversity folder where you started the notebook. The notebook server can only go down to sub directorires, not sideways to a sibling directory.

GD has a HARD link in my home directory (~/Google Drive is identical to /Users/charles/Library/CloudStorage/GoogleDrive-cafaweir@gmail.com/My Drive/ )

Oh, that's a complication. I'm used to Linux, where hard links to directories are impossible. On Mac, it appears that they are possible but with quite a lot of caveats: https://stackoverflow.com/questions/80875/what-is-the-unix-command-to-create-a-hardlink-to-a-directory-in-os-x

I think I have a guess, however. Is any level of your ~/Library/CloudStorage/OneDrive-LancasterUniversity a hidden folder? Jupyter by default refuses to serve things in hidden folders (this is meant to make it a little bit harder to leak things like SSH private keys). You can see the precise checks that it's doing for hidden files here:

https://github.com/jupyter/jupyter_core/blob/353635245dc8b8554c68713bbf438b1d778ad84d/jupyter_core/paths.py#L455-L506

If that is it, there's a config option ContentsManager.allow_hidden which you can set to bypass this check. Obviously turning that on might make you a bit less secure, so you might want the links you suggested as an alternative.

charlesweir commented 1 year ago

That's it!

The ~/Library folder is always hidden. So using: jupyter notebook --generate-config
open ~/.jupyter/jupyter_notebook_config.py

And uncommenting and changing the ContentsManager.allow_hidden to True, fixes the problem.

I see the security impact of the change. However, the leak can only be to other users of the server, so it's no problem unless you're exposing the server on the LAN or there a way for another user to login to your machine.

Many thanks, @takluyver !

takluyver commented 1 year ago

No problem! I'll leave this open because I guess nbopen could check for hidden folders and not try to use a notebook server from which the file is hidden.

(To be clear on the security thing: the risk doesn't necessarily need another user - e.g. if you open a malicious webpage in your browser, it could try to make an HTTP request to localhost to collect information and send it elsewhere. Browsers have mechanisms to protect against these 'cross origin' requests, so it shouldn't be able to do this. But browsers are enormously complicated and there are some ways in which localhost is a special case, so it doesn't hurt to have an extra line of defense.)