stephenslab / ipynb-website

Simple data science website using Jupyter notebooks.
https://tinyurl.com/yb7mal2m
MIT License
66 stars 29 forks source link

Password Protection #5

Closed aabiddanda closed 6 years ago

aabiddanda commented 6 years ago

There may be some instances where one may want to have password protection (e.g. a work-in-progress that one would only want to share with a couple of folks). Is there a way to set that up as the end product of this? For instance one could borrow code from here

pcarbo commented 6 years ago

@aabiddanda The simplest solution is to turn off Github Pages, and to browse the webpages locally by cloning or downloading the repository. See also this discussion for some other suggestions, including Beaker Browser.

gaow commented 6 years ago

@aabiddanda I'm wondering what's the best implementation to it. If all pages are protected it is then a pain to view it for yourself. On the other hand is it necessary to only protect index.html? You see assuming you are to protect your work, you must have chosen a private repo; that way others will not even able to deduce what's the proper URL to the github.io page ... also for these pages I assume they can still be searched on Google and visited given a direct link?

I took a look at that link. To make it inside a pipeline we'll have to automatically deploy the dir structure. The question boils down to how to generate that sha1 from command-line given input password that agrees with the sha1 outcome generated from that javascript for given keyword. Ideas?

Basically it involves loading that Sha1 javascript and run this line :

 var hash = Sha1.hash(secret)

should not be hard.

gaow commented 6 years ago

Once we figured that out, then interface-wise maybe:

protect_pages: ['a.ipynb', ...]

Basically you need to manually say which page you want to protect. Then the program will ask you to set a password, or maybe read from a .password file that you keep outside github repo, then create password protected pages. Any other proposals?

pcarbo commented 6 years ago

@gaow This seems like an interesting impementation of password protection. However, how secure is it? I wonder if this approach is widely used; it may require some investigation.

To explore, I would recommend creating a new github repo (e.g., ipynb-website-protected), adding the necessary JavaScript and HTML code, and testing it.

gaow commented 6 years ago

However, how secure is it

I am not sure -- under the hood instead of showing the page directly it asks for a password which gets converted to a hash that points to an "random" folder on the website; then the wrapper page figures out that hash and points to the contents. So I assume it is still "google-searchable"? However I believe adding some configuration to that folder to prevent paging is also possible. In other words we can think of this as a start point and add other levels of security to it. But for now, as I questioned @aabiddanda in my first post, it is not really a secure solution as is.

gaow commented 6 years ago

To explore, I would recommend creating a new github repo (e.g., ipynb-website-protected), adding the necessary JavaScript and HTML code, and testing it.

Actually the demo from that repo kind of tests the interface already. But there is perhaps no way to test how google-able it is when webpages are under some random hash string named dirs?

I guess if we do it, maybe the first thing to resolve is still getting var hash = Sha1.hash(secret) in batch mode. After 2min research I realized executing javascript node executable on a desktop is not trivial because it's an extern program to be installed even on my Debian! So maybe the first thing to do is to cook up a Python equivalent line for that, considering sha1 is a standard thing.

@aabiddanda I'd take a look when I'm less busy but discussions / suggestions are welcomed.

gaow commented 6 years ago

Ha actually it is super easy:

>>> import hashlib
>>> hashlib.sha1(b'password').hexdigest()
'5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8'

which matches what that demo uses here:

https://chrisssycollins.github.io/protected-github-pages/5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8/index.html

So @aabiddanda it should be relatively simple to implement. But as @pcarbo questioned, do you believe it is secure enough? If you want extra security maybe you need to search "how to make a folder or page not google searchable" and share tips here that we can possibly adlopt.

aabiddanda commented 6 years ago

Thanks for taking such a detailed look into this! The solution above would definitely be secure enough for what I need to do (and I suspect would be enough for most people). The way I was envisioning it was that just the index.html file would need to be password protected and that anyone who has the password would be able to take a look at the site. @gaow I think that your quick solution up above is exactly what I am looking for

gaow commented 6 years ago

@aabiddanda Okey, in that case, as I said, I'd like to go for a slightly more general implementation, ie, allow protecting any page. Implementation of a neat pipeline interface / tests should take a couple of hours. I'll get to it when I can spare that chunk of time, hopefully by the end of this week.

@pcarbo as far as security concerned, isnt there any meta tag for HTML that tells Google or other search engines to not page it? If so, we can easily slide that line in. At that point we'd have some medium level of security I guess.

pcarbo commented 6 years ago

@gaow @aabiddanda Update: see here for a suggested solution that I believe could work reasonably well without having to make any changes to the existing release.sos code.

gaow commented 6 years ago

I don't think the solution given above is any safer than generating a random SHA1 and hiding the webpages within a subdirectory of that name.

Thanks @pcarbo I agree with your assessment! The password stuff is just an eye-candy. Without that eye-candy indeed there is no need to do any additional coding. Except that there will be only one SHA1 protecting the entire site while the proposed eye-candy protects any given page with a unique SHA1

However my real concern is this statement:

it would be extremely difficult (not impossible) for anyone who did not have this address to discover the protected webpage

True for humans, but how true is it for google? How about adding metatag?

https://css-tricks.com/snippets/html/meta-tag-to-prevent-search-engine-bots/ https://support.google.com/webmasters/answer/79812?hl=en https://support.google.com/webmasters/answer/93710?hl=en

is that safer?

Since this eye-candy can be added with perhaps slightly over 30min of coding i might just do it later this weekend to close this ticket. Adding metatag will be additional work but we can do it if it truly can be trusted! I am not 100% confortable in this case because there is no way I can think of to test, and we have to take their word for it .

pcarbo commented 6 years ago

@gaow I agree it would be safer to add tags to prevent search engine (e.g., Google) indexing.

Since this eye-candy can be added with perhaps slightly over 30min of coding i might just do it later this weekend to close this ticket.

@gaow Sounds good!

gaow commented 6 years ago

Okey just got a chance to play with it ... so far I realized 2 limitations:

  1. password protected page cannot be viewed locally (unless direct link is given) because javascript (at least on google chrome) does nto like file: protocol.
  2. the protected page will lose all CSS / JS / files it points to.

I cannot fix 1 so I added a more informative error message to the javascript when file: protocol is seen. For 2, potentially we can use absolute path for CSS/JS but not for files -- we cannot force whoever writes the notebook to keep that in minde ...

@pcarbo I ended up using <hash>.html instead of <hash>/index.html. The hash is based on both password and relative file path. That solves the path issue but not sure if there is any compromise on security? Notice that I did add metatag as previously suggested, to block search engines.

https://stephenslab.github.io/ipynb-website/protected.html (password is protect_folder) https://stephenslab.github.io/ipynb-website/protected/protected_page.html (password is protect_page)

To use it, you add this line to your configuration, which points to this file that indicates folders or pages you'd like to protect. For protected folders either putting in its index page, or just the folder name should work.

@aabiddanda see above. But I suggest we wait for Peter's comments, and a new SoS release before upgrading it.

gaow commented 6 years ago

A new version of SoS has been released that has fixed a bug I noticed when implementing this feature. I've bumped SoS version requirement. To upgrade and use this password protect feature:

./release.sos upgrade-jnbinder
./release.sos upgrade-jnbinder # yes, twice!
./release.sos upgrade-sos
rm -rf .sos; rm -rf ~/.sos/.runtime

Then you can use the new features.

gaow commented 6 years ago

@pcarbo I just updated jnbinder to reflect changes we made on dsc2 website:

1) Add a search bar to sidebar table of contents 2) Use title or abbrev title on side bar table of contents 3) Support Rmd files

Along with this ticket of password protection I think we need to make another release. Do you agree? I am closing this ticket now since things seem to work.

gaow commented 5 years ago

I recently ran into staticrypt which looks neater and is certainly safer than my existing solution in jnbinder. I wrote a separate script that converts ipynb to html using dockerized staticrypt:

https://github.com/gaow/ipynb-html-encrypted

I'm quite happy with how it works. It is not difficult to adopt to our application here but it is one more software dependency (software written in Javascript); also it might take a couple of hours that I do not want to spare now. Just making a note here.