Restrict public access to http://www.namibiaopenoil.org:3000/

nsetru commented 8 years ago

At the moment http://www.namibiaopenoil.org:3000/ is open for public and for anyone to view. We need to restrict this access. Options available -

password protect
.htaccess ip restrictions. Set a divert to maintenance page and only allowed ip-address can access the site

lbewlay commented 8 years ago

@nsetru : perhaps adding password protection would be more feasible since we may be accessing from different networks.

lbewlay commented 8 years ago

@nsetru : perhaps we don't need to restrict access for now since we are using an anonymized dataset.

missfunmi commented 8 years ago

Since the website is currently still in development stage and since any website crawler can access the site, I think it's worth restricting access anyway for a few reasons:

While it's still in development, developers may end up accidentally leaving some identifying information or unintended information in the source code or comments
Until it's been properly penetration tested and load tested (something we may want to do before launching to the public), the site is vulnerable to potential attacks (hacks, DDoS, other_malicious_things)
Since we're still playing around with the anonymised data and even the UX of the site, it's still too early to know what types of visualizations (either valid or invalid) we may end up experimenting with or producing as we work through the project. Having a potential, unknown, external audience who has access to the site introduces more complexity around the safety, accuracy, or validity of what we're working with. It would be easier to control access while this is still being thought through

Unless there's a major need to have the site accessible to the public, my vote would be to restrict access in some form until the site is ready to go live. I'm indifferent to the exact mechanics of how (.htaccess or password protection or something else), as long as it's actually restricted.

Thoughts, @lbewlay, @nsetru, @raquel-ucl?

rapidexpert commented 8 years ago

Why don't we use the robots.txt and disallow everything from web crawlers for now?

missfunmi commented 8 years ago

Robots (and humans) can still ignore a site's robots.txt, so it actually doesn't provide any security.

lbewlay commented 8 years ago

Thanks for your analysis on the situation. We can go ahead and restrict access. I suggest adding a password protected page to access the site.

raquelalegre commented 8 years ago

We could do both password protection and IP restriction, if that's possible. One is never too safe!

raquelalegre commented 8 years ago

Unless someone is changing IPs all the time for whatever reason and makes it impossible for them to access the site. It's not my case, though.

missfunmi commented 8 years ago

Personally, I use a VPN sometimes at home, so I would run into issues with the IP restriction. But also when we meet at different locations (Southbank Centre, WHFNP Meetup, etc.), wouldn't your IP address change?

So that brings up an interesting point: do we as developers need access to the external site? I would say we should since it would be good to occasionally check in on the site and how it looks, make sure it's not been hacked, etc.

So perhaps IP restriction may be too restrictive. What do you think @raquel-ucl?

raquelalegre commented 8 years ago

Yep, you are very right. I don't really check the live site often, but it's nice that if we want from our face to face meetings we are able to do it. Also if it's a problem for you, then more of a reason not to do IP restriction.

nsetru commented 8 years ago

@raquel-ucl @missfunmi - yes ip restricting would be very restrictive as ip-address's could change. So, we will not go for ip-restriction.

I will put password protect on the directory and probably put a redirect page at root level - that should be enough for now I suppose.

I think - we don't need to worry much about DDOS attacks, as they mostly target big websites to bring the network down. At the moment that's not the issue for us.

rapidexpert commented 8 years ago

I think we should also do the robots.txt restriction in addition to password protection and X-robots metatag. Here is what Google has to say on it.

missfunmi commented 8 years ago

I'm not convinced robots.txt is needed at this point if we're password-protecting the site. Check out this guide, specifically item no 5.

The key vulnerability we're protecting against at this stage is unfriendly crawlers and bots. As the above link highlights, robots.txt is a recommendation, not a protocol that all bots or crawlers are required to adhere to. So there is no guarantee that it will offer any protection for the particular issues highlighted a few comments above.

But I'll let @nsetru opine since she'll be working on this change.

missfunmi commented 8 years ago

I suppose a robots.txt file will prevent Google from indexing the site, but I don't think that is needed if the site is password-protected (Google's bot or any crawler really cannot get past the password-protected page).

Have I missed another use case?

I'll shut up now and let others chime in.

nsetru commented 8 years ago

Its not very important to have robots.txt. Its just to define few rules for web crawlers or robots to not look into certain directories. So, generally when we create web directories we just add this robots.txt as a norm. But, it's not very relevant to the actual scenario here - which is public shouldn't have access to this development area until it goes live.

So, adding or not adding robots.txt will make no difference as far as access to public is concerned.

lbewlay commented 8 years ago

So it looks like we will just do a password protected page...

missfunmi commented 8 years ago

Fyi: changes on #80 may impact this.

nsetru commented 8 years ago

Not necessarily - I suppose. I'm doing basic HTTP authentication. So, I'm not referring to domain name anywhere for any checks. But, will keep an eye.

Thanks for heads up @missfunmi

nsetru commented 8 years ago

1) Install htpassword

$ npm install -g htpasswd
$ cd /data/

# this should create a new file called users.htpasswd within /data/ dir
$ htpasswd -bc users.htpasswd foo bar 
(where -c new file, -b for password
users.htpasswd is the file
foo is username
bar is password)

2) Install http-auth

$ npm install http-auth

3) Finally add following code within server.js - to connect to http-auth module and perform authentication

require(http-auth)

// Define vars for basic authentication. Issue #69
var basic = auth.basic({
  authRealm: "Private area",
  file: __dirname + "/data/users.htpasswd" // this is where passwords stored
});

// Basic HTTP authentication
app.use(auth.connect(basic));

lbewlay commented 8 years ago

@nsetru: thanks for the update on this. Shall i go ahead and install this on the server?

nsetru commented 8 years ago

@lbewlay Those are the notes explaining the process I followed.

The commands you need to run on server :


# Sync code on sever with develop branch via git pull or any other method you have been using

# install http-auth from package.json. http-auth should be installed
$ npm install

# check for users.htpasswd file in /data/ dir
$ cd /data/

# If it exists add a new username and password for authentication
$ htpasswd -b user.htpasswd <username> <password>

# check if there is an entry created in users.htpasswd
$ vim data/users.htpasswd

# Finally start the process
$ npm start

When you navigate to the website, there should be a prompt for username and password

lbewlay commented 8 years ago

@nsetru : I updated the code on the server with the latest branch and the password protection is working now..

lbewlay commented 8 years ago

We can add the live data back to the website now since its password protected. I can update the cartoDB tables.

lbewlay commented 8 years ago

Live data added back to cartoDB. I left the old tables so i wouldn't break the current setup. The new tables are companies_live concessions_live license_holdings_live licenses_live

missfunmi commented 8 years ago

Closing since this was implemented and merged in pull request #83. Have opened a separate issue #88 to track the code review feedback on the pull request.

missfunmi commented 8 years ago

Hi @lbewlay, a couple questions:

I noticed that the live dataset does not contain any geo coordinates. Will these be added soon? Otherwise, using the live tables, nothing shows up on the map.
Also, the data in the live tables is much smaller than I would have expected. For example, with the query we're using to load data from CartoDb for the License Holders page, there are only 5 companies matching on 1 PEL returned when using the live tables. Will the full data set be loaded in at some point?

lbewlay commented 8 years ago

@missfunmi Thanks for pointing it out. 1.I used the original Geo-cordinates and added them to the 'concessions_live' table which the sql query is pulling from. Let me know if it works now. These co-ordinates were pulled form the Namibia cadastre data by Raquel and added to cartodb some time back.

2.In the companies table we managed to collect data for about 40 companies. This is the dataset we will be working with for this phase the original goal was to collect full license data for 10 companies. The partner in Namibia will continue to populate the data once the platform is live. So yes the full data set will be loaded eventually.

namibmap / IPPR

Restrict public access to http://www.namibiaopenoil.org:3000/ #69