Adapt Graphryder to the ethical consent funnel results

albertocottica commented 6 years ago

The opencare platform acquires user consent through something called the consent funnel. This malfunctioned, and in summer 2017 we set out to try and fix the problem. Recaps and results are here.

What needs to happen now:

We provide you with a list of consentless users. A discussion is ongoing, but it should be fast.
This list is used in the harvesting script as a filter to drop those users. I am not sure how to do it: the cleaner possibility is to check in the database that the consent has been given. In this case, all information is contained in the database, with no outside information.
This also means dropping the content authored by these authors, and the annotations on that content.

Based on this new process, we then proceed to regenerate the dashboard and produce the export.

guywiz commented 6 years ago

Putting our hands on the data is feasible from the GraphRyder API, so we can already form a bunch of json files (following the same patron as we did last year: users.json, posts.json, comments.json, etc.-). One we have the list of people who oppose publication of their content, we can amend these files. Just to make sure I get things right: we need to discard those users and the content they authored. Also, we should include a short readme file explaining what data we publish (and what we don't).

albertocottica commented 6 years ago

By end of today we should have an API endpoint for the consent funnel.

albertocottica commented 6 years ago

API endpoint ready.

As per what Marco says here, both the exported data and the actual Graphryder should be based only on those users who have actively given consent. To find them, call up this endpoint (needs API authentication): https://edgeryders.eu/administration/annotator/users.json

and select users for which "edgeryders_consent" = "1"

guywiz commented 6 years ago

(Echoing a comment from Report part B) Coming back to you on the data export we still need to do.

On user identity.

Anonymization is hard. A soft approach is to simply display users as ids (numbers). (At least those who do not wish to be identified as having participated to the conversations.)

On authored content.

Do we simply discard any content authored by these users as well. This can have dramatic impact on the SSNA analysis, since it will break conversations (A replying to B replying to C, if B is taken out we lose the indirect link from A to C).

P.S. Do we have a list of user ids we need to "discard"?

albertocottica commented 6 years ago

@guywiz :

On anonymization: that's a negative. See here. There should be Python libraries which do SHA-256 lying around.

On authored content. Correct, but nothing we can do. Unless you want to keep the social network but discard the semantics, which would be more work.

On "a list of users". We do it in a cleaner way, with calls to the API. See here. The database needs updating (from our end) as Noemi and I have managed to get a few more people to give consent. She has been off the loop, but I think she is back at the house today. You can of course start writing the export code; I will let you know when the new information is incorporated into the dataset. At that point you run your script and it is done.

guywiz commented 6 years ago

Followed the instructions on the Edgeryders' API page but didn't see any "Generate key" button under Permissions ... (I do not have Admin priviledge, but do I need it?)

Also, reading the page it seems I will be able to access all content, even those published by user who didn't give consent so it's up to me to filter out those users and content. That means GraphRyder will also need to filter content -- which for now it does not ... Asking @jason-vallet

albertocottica commented 6 years ago

Only we site admins can generate API keys. @jason-vallet has one already.

The script can access everything, but it can then copy onto Neo4j only the content that (1) relates to the ethno-opencare Discourse tag and (2) was authored by users for which "edgeryders-consent" = "1" .

On Mon, Jan 29, 2018 at 3:09 PM, guywiz notifications@github.com wrote:

Followed the instructions on the Edgeryders' API page https://edgeryders.eu/t/using-the-edgeryders-eu-apis/7904#chap-5 but didn't see any "Generate key" button under Permissions ...

Also, reading the page it seems I will be able to access all content, even those published by user who didn't give consent so it's up to me to filter out those users and content. That means GraphRyder will also need to filter content -- which for now it does not ... Asking @jason-vallet https://github.com/jason-vallet

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/opencarecc/graph-ryder-dashboard/issues/35#issuecomment-361255931, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4p5uHGmoum7mNcDkJ2D30Bfjx5fCvPks5tPdD-gaJpZM4RIKf5 .

guywiz commented 6 years ago

Yep, precisely. My question to @jason-vallet was whether this is what GraphRyder does already, or whether we need to adapt the underlying script.

The script can access everything, but it can then copy onto Neo4j only the content that (1) relates to the ethno-opencare Discourse tag and (2) was authored by users for which "edgeryders-consent" = "1" .

albertocottica commented 6 years ago

It does (1) but not (2).

On Mon, Jan 29, 2018 at 6:29 PM, guywiz notifications@github.com wrote:

Yep, precisely. My question to @jason-vallet https://github.com/jason-vallet was whether this is what GraphRyder does already, or whether we need to adapt the underlying script.

The script can access everything, but it can then copy onto Neo4j only the content that (1) relates to the ethno-opencare Discourse tag and (2) was authored by users for which "edgeryders-consent" = "1" .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/opencarecc/graph-ryder-dashboard/issues/35#issuecomment-361321566, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4p5vt04nf8ak4aEQdXQATidEAX0Cibks5tPf_6gaJpZM4RIKf5 .

jason-vallet commented 6 years ago

Hey guys, I actually had pushed all the necessary modifications last week but had actually forgotten to update the database (silly me!).

Basically, I had three possible solutions to address this whole issue:

Remove from GR the content for which we do not have an explicit consent. Plain and simple, sure, but a disaster for the whole network as complete posts (those whose authors did not gave consent) and comments were removed as well as their corresponding annotations and codes. We ended up with: 230 users, 577 posts, 2571 comments, 4621 annotations and 1165 codes.
The second solution a little less extreme was to keep all the pieces of content written by persons which did not give their consent but obfuscate them. Basically, all the authors are replaced by a single anonymous user who is considered as the creator of the contents. The titles and bodys of the posts and comments are also obfuscated, as well as the specific pieces of text annotated. The codes are still attached to the posts/comments but we do not know how exactly. This has a several advantages as the code-to-code relations are still preserved as we know them, and the obfuscated content, while not readable on the GR platform, can still be accessed using the hyperlinks referring to EdgeRyders (access to the content is authorised on the ER website). On the bad side, this screws the social network as all the authors of obfuscated content are considered as a single anonymous user. Result: 245 users, 659 Posts, 3248 comments, 5625 annotations, and 1282 codes.
The last solution is retrospectively the most straightforward and the one currently being deployed. Authors which did not give their consent are anonymous (username is not displayed), but we keep track of which obfuscated content they have authored. The pros of the previous solution are still valid while removing the issue concerning the social network. Result: 337 users, 659 posts, 3248 comments, 5625 annotations, and 1282 codes.

So a user will see content which as not been cleared as follows:

and the social network still present a logical structure:

Ultimately, it is still possible to find what anybody has written on subject, but this knowledge is only available when going through ER which comply with the TOS thus freeing us from the consent issue.

guywiz commented 6 years ago

Please @albertocottica validate solution 3. Also, this means the data we need to publish on zenodo can be downloaded from GR (no need to write extraction script). Please @jason-vallet confirm (also indicate whether I need special privilege to download the data).

albertocottica commented 6 years ago

It makes sense to me, but we should clear it with Marco, who is in charge of ethics. Also, we should still pseudonymize everyone, not just the consentless users.

Matthias tells me he will update the database of the consent funnel today. We have acquired consent from 6 more users.

albertocottica commented 6 years ago

Marco has asked for time. Meanwhile, Matt has updated the database. The data are now complete, at least on the Edgeryders database.

opencarecc / graph-ryder-dashboard

Adapt Graphryder to the ethical consent funnel results #35