theodi / shared

Repo that we use for non-repo-specific stories and other shared stuff.
22 stars 1 forks source link

Add basic address interactive capture and bulk upload functionality #413

Closed giacecco closed 9 years ago

giacecco commented 9 years ago

The Board suggested that we collected addresses from day 1, interactively and in bulk.

Even if we are not ready to process them, the most natural thing our website should do is asking people to submit addresses. They imagine a big writing at the top of the landing webpage saying something like "I don't care who you are, but tell me where you are:" [some input boxes], so that in one shot we make clear that we're not putting our nose in the visitor's identity but we need valid addresses. They also suggested we should allow people to submit entire datasets.

What is the easiest way to make this happen without changing the current gh-pages hosted solution and keeping the target date for the "static" website?

There is no need to automatically process the addresses being submitted, nor to publish them back, not even to say how many we got. At this stage, we can do this manually if we think it is suitable.

Floppy commented 9 years ago

Google form

giacecco commented 9 years ago

.. and for the bulk uploads?

JeniT commented 9 years ago

Let's skip the bulk uploads for now (as discussed in the meeting, we really want these only when we have automatic ingest).

A Google form works, and would let us publish the associated spreadsheet, but are we able to style it or embed it in our pages?

giacecco commented 9 years ago

Fine for postponing bulk upload.

I don't think we can publish the spreadsheet back in real-time after users' submissions. Without some degree of moderation - that is not a planned feature for Alpha - people could write anything, for profanities to fictional addresses etc..

I will share this matter with @peterkwells when we catch-up on the phone this morning.

Floppy commented 9 years ago

We have a solution here. This is a small app that allows form submissions to be stored in a github-hosted CSV. This is totally what we should do.

Floppy commented 9 years ago

The default setup for that app requires users to have a github account. Are we OK with that at this stage, or should we hardcode the account internally to one of our own so it can be anonymous?

Floppy commented 9 years ago

To keep the data private, you'd need a Bronze organization account, at $25 a month.

Floppy commented 9 years ago

It looks like actually you can easily style Google forms. I think this might be simpler.

Floppy commented 9 years ago

OK, I've done a test form submission at https://openaddressesuk.github.io/submit/. Give it a go, see what you think. I've shared the results sheet with @giacecco and @peterkwells. If this is OK, we can create a proper form in the openaddresses organisation and move over to that.

Obviously it needs copy around it to explain what's going on.

JeniT commented 9 years ago

Does/can this protect against automated submissions? We don't want anyone to write a script to submit addresses from another source as that could pollute the whole thing.

Floppy commented 9 years ago

We have a honeypot field to protect against standard spambots, but it won't protect against a specially-designed attack designed to pollute the data. Unfortunately google forms doesn't support captchas. You'd think it would.

pezholio commented 9 years ago

Could we not design the form in such a way that only a browser could submit the info? Or is there a way to log the browser (in a hidden field), so we can easily filter automated submissions?

JeniT commented 9 years ago

We could ask people not to do it and explain why. Or see if it's actually a problem and only try to do something about it if/when we find it is.

Floppy commented 9 years ago

I'd be kind of inclined to do that one... bit laissez faire I know, but it depends on whether we expect malicious submissions.

peterkwells commented 9 years ago

I would suggest we start too strong, if anything. The tone/impression that we set when we launch can/will stick. A single fake address that matches one of the ones planted in Addressbase will cause us problems, let alone a mass input.

Given that even though I'd love to do this can we pause this one until we've had time to check the legal advice on the publishing platform approach to publishing addresses that we receive; write some guidelines on what we want/don't want and provide a process for removing infringing material?

There's many examples out there in the Internet to build on and as a well-behaved company we should look at those. As a simple example check out Youtube and their T&C's, community guidelines and notes on copyright:

https://www.youtube.com/upload

Floppy commented 9 years ago

OK, I'll leave it as-is for now, until you let me know how you want to proceed.

peterkwells commented 9 years ago

Jeni and I have agreed to:

  1. Take the risk with automated loads, i.e. no need for captcha
  2. Write copyright guidelines for submitters (short version: "we don't want IP-ridden addresses!")
  3. Write/build a quick "objections" process where copyright owners can register an issue
  4. Add some FAQ text around how we will handle bad addresses
  5. Delay website launch until we do the above. This needs to be in the first release.

I'll write up up some more detailed reqs and link them to this issue

Floppy commented 9 years ago

@peterkwells In that case, I'll create a proper form in the OA google drive and change over to using that one. Can you detail exactly what fields you want to have, and what the options should be if there are any?

Floppy commented 9 years ago

As this isn't going live, I'm going to say that this ticket is done for this sprint. We can open another for the followup work next time.

JeniT commented 9 years ago

To be clear, we are not going live until this is done. We are delaying going live to make sure it can be done prior to going live.

Floppy commented 9 years ago

:+1:

Floppy commented 9 years ago

This decision could still come through this week, so I'm moving this back to ready.

giacecco commented 9 years ago

Had a quick catch-up with @peterkwells on this, I believe the thread this far captures the current status of this feature and in particular Peter at https://github.com/theodi/shared/issues/413#issuecomment-60054844 .

One thing to clarify is that the decision on the wording and 'shape' of the form through which we collect the data needs some thinking and possibly validation with Legal. I've suggested @Floppy not to develop the form further until we have revised and detailed specs from @peterkwells on how that form looks like and how it works.

peterkwells commented 9 years ago

Requirements for review by @giacecco @JeniT @Floppy https://docs.google.com/a/openaddress.es/document/d/1k1HBJ_dCMpfhKPGGPMq4yrq2MbjaW9hwbG4NFGIiq_M/edit

pezholio commented 9 years ago

Looks good. The only issue here is a simple Google docs form won't be able to catch the user agent and IP. We could implement a simple server side solution in Heroku though maybe?

peterkwells commented 9 years ago

If it makes it easier have discussed with Jeni and we agreed to take out UA. Doc is updated.

pezholio commented 9 years ago

OK, getting the IP still is going to be difficult. We can capture it via JavaScript, but again, this would be easy to fake and if a user has JS turned off, it won't log. Will this be an issue?

JeniT commented 9 years ago

In some ways having something that can only be set through Javascript is helpful as it will enable us to identify scripted submissions.

JeniT commented 9 years ago

Or will it... I guess it could be easily faked.

pezholio commented 9 years ago

Hmmmmm... Actually, looking at it, I don't think it's possible without using some kind of server side solution. I think the best option would be a simple proxy (probably a Sinatra app) that takes the form contents and submits it to the Google form with the IP included. Probably harder to fake too.

pezholio commented 9 years ago

I've built a proxy here https://github.com/OpenAddressesUK/adress-capture. We can then host it on Heroku and add the relevant env variables to the app and change the form on the frontend. Do we have a Gdocs form set up yet? Or shall I do it?

pezholio commented 9 years ago

Right, there's stuff here https://github.com/OpenAddressesUK/openaddressesuk.github.io/pull/12