Closed aabiddanda closed 6 years ago
@aabiddanda The simplest solution is to turn off Github Pages, and to browse the webpages locally by cloning or downloading the repository. See also this discussion for some other suggestions, including Beaker Browser.
@aabiddanda I'm wondering what's the best implementation to it. If all pages are protected it is then a pain to view it for yourself. On the other hand is it necessary to only protect index.html
? You see assuming you are to protect your work, you must have chosen a private repo; that way others will not even able to deduce what's the proper URL to the github.io page ... also for these pages I assume they can still be searched on Google and visited given a direct link?
I took a look at that link. To make it inside a pipeline we'll have to automatically deploy the dir structure. The question boils down to how to generate that sha1
from command-line given input password that agrees with the sha1
outcome generated from that javascript for given keyword. Ideas?
Basically it involves loading that Sha1 javascript and run this line :
var hash = Sha1.hash(secret)
should not be hard.
Once we figured that out, then interface-wise maybe:
protect_pages: ['a.ipynb', ...]
Basically you need to manually say which page you want to protect. Then the program will ask you to set a password, or maybe read from a .password
file that you keep outside github repo, then create password protected pages. Any other proposals?
@gaow This seems like an interesting impementation of password protection. However, how secure is it? I wonder if this approach is widely used; it may require some investigation.
To explore, I would recommend creating a new github repo (e.g., ipynb-website-protected
), adding the necessary JavaScript and HTML code, and testing it.
However, how secure is it
I am not sure -- under the hood instead of showing the page directly it asks for a password which gets converted to a hash that points to an "random" folder on the website; then the wrapper page figures out that hash and points to the contents. So I assume it is still "google-searchable"? However I believe adding some configuration to that folder to prevent paging is also possible. In other words we can think of this as a start point and add other levels of security to it. But for now, as I questioned @aabiddanda in my first post, it is not really a secure solution as is.
To explore, I would recommend creating a new github repo (e.g., ipynb-website-protected), adding the necessary JavaScript and HTML code, and testing it.
Actually the demo from that repo kind of tests the interface already. But there is perhaps no way to test how google-able it is when webpages are under some random hash string named dirs?
I guess if we do it, maybe the first thing to resolve is still getting var hash = Sha1.hash(secret)
in batch mode. After 2min research I realized executing javascript node
executable on a desktop is not trivial because it's an extern program to be installed even on my Debian! So maybe the first thing to do is to cook up a Python equivalent line for that, considering sha1
is a standard thing.
@aabiddanda I'd take a look when I'm less busy but discussions / suggestions are welcomed.
Ha actually it is super easy:
>>> import hashlib
>>> hashlib.sha1(b'password').hexdigest()
'5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8'
which matches what that demo uses here:
So @aabiddanda it should be relatively simple to implement. But as @pcarbo questioned, do you believe it is secure enough? If you want extra security maybe you need to search "how to make a folder or page not google searchable" and share tips here that we can possibly adlopt.
Thanks for taking such a detailed look into this! The solution above would definitely be secure enough for what I need to do (and I suspect would be enough for most people). The way I was envisioning it was that just the index.html
file would need to be password protected and that anyone who has the password would be able to take a look at the site. @gaow I think that your quick solution up above is exactly what I am looking for
@aabiddanda Okey, in that case, as I said, I'd like to go for a slightly more general implementation, ie, allow protecting any page. Implementation of a neat pipeline interface / tests should take a couple of hours. I'll get to it when I can spare that chunk of time, hopefully by the end of this week.
@pcarbo as far as security concerned, isnt there any meta
tag for HTML that tells Google or other search engines to not page it? If so, we can easily slide that line in. At that point we'd have some medium level of security I guess.
@gaow @aabiddanda Update: see here for a suggested solution that I believe could work reasonably well without having to make any changes to the existing release.sos
code.
I don't think the solution given above is any safer than generating a random SHA1 and hiding the webpages within a subdirectory of that name.
Thanks @pcarbo I agree with your assessment! The password stuff is just an eye-candy. Without that eye-candy indeed there is no need to do any additional coding. Except that there will be only one SHA1 protecting the entire site while the proposed eye-candy protects any given page with a unique SHA1
However my real concern is this statement:
it would be extremely difficult (not impossible) for anyone who did not have this address to discover the protected webpage
True for humans, but how true is it for google? How about adding metatag
?
https://css-tricks.com/snippets/html/meta-tag-to-prevent-search-engine-bots/ https://support.google.com/webmasters/answer/79812?hl=en https://support.google.com/webmasters/answer/93710?hl=en
is that safer?
Since this eye-candy can be added with perhaps slightly over 30min of coding i might just do it later this weekend to close this ticket. Adding metatag will be additional work but we can do it if it truly can be trusted! I am not 100% confortable in this case because there is no way I can think of to test, and we have to take their word for it .
@gaow I agree it would be safer to add tags to prevent search engine (e.g., Google) indexing.
Since this eye-candy can be added with perhaps slightly over 30min of coding i might just do it later this weekend to close this ticket.
@gaow Sounds good!
Okey just got a chance to play with it ... so far I realized 2 limitations:
file:
protocol. I cannot fix 1 so I added a more informative error message to the javascript when file:
protocol is seen. For 2, potentially we can use absolute path for CSS/JS but not for files -- we cannot force whoever writes the notebook to keep that in minde ...
@pcarbo I ended up using <hash>.html
instead of <hash>/index.html
. The hash is based on both password and relative file path. That solves the path issue but not sure if there is any compromise on security? Notice that I did add metatag
as previously suggested, to block search engines.
https://stephenslab.github.io/ipynb-website/protected.html (password is protect_folder
)
https://stephenslab.github.io/ipynb-website/protected/protected_page.html (password is protect_page
)
To use it, you add this line to your configuration, which points to this file that indicates folders or pages you'd like to protect. For protected folders either putting in its index page, or just the folder name should work.
@aabiddanda see above. But I suggest we wait for Peter's comments, and a new SoS release before upgrading it.
A new version of SoS has been released that has fixed a bug I noticed when implementing this feature. I've bumped SoS version requirement. To upgrade and use this password protect feature:
./release.sos upgrade-jnbinder
./release.sos upgrade-jnbinder # yes, twice!
./release.sos upgrade-sos
rm -rf .sos; rm -rf ~/.sos/.runtime
Then you can use the new features.
@pcarbo I just updated jnbinder
to reflect changes we made on dsc2 website:
1) Add a search bar to sidebar table of contents 2) Use title or abbrev title on side bar table of contents 3) Support Rmd files
Along with this ticket of password protection I think we need to make another release. Do you agree? I am closing this ticket now since things seem to work.
I recently ran into staticrypt
which looks neater and is certainly safer than my existing solution in jnbinder. I wrote a separate script that converts ipynb to html using dockerized staticrypt:
https://github.com/gaow/ipynb-html-encrypted
I'm quite happy with how it works. It is not difficult to adopt to our application here but it is one more software dependency (software written in Javascript); also it might take a couple of hours that I do not want to spare now. Just making a note here.
There may be some instances where one may want to have password protection (e.g. a work-in-progress that one would only want to share with a couple of folks). Is there a way to set that up as the end product of this? For instance one could borrow code from here