unicef / kindly

GNU Affero General Public License v3.0
24 stars 17 forks source link

FEATURE: Contributions Counter on Kindly website #108

Closed nathanfletcher closed 2 years ago

nathanfletcher commented 2 years ago

Please Describe The Problem To Be Solved

The current Kindly website accepts contributions of offensive and non-offensive phrases and sentences from users on the contributions page. However there is no way to know on the website how many contributions have come in, and how many are in the current training dataset file in the Kindly repository.

Suggest A Solution

The data collected from the website is compiled into a Google Sheet to be anonymised. From the Google Sheet, it is to be added to the current training dataset file in the Kindly repository. The tasks to be completed to achieve this include:

lacabra commented 2 years ago

@nathanfletcher: I would suggest to keep issues separate for easier tracking. In the list above, items 4 and 5 are not related to this task, and a different issue should be created for these.

Related to the issue at hand, items 2 and 3 don't need to be part of GitHub Actions. Yes, they can should be scripted, and they may be run as part of a build that happens in GitHub Actions, but I find the mention of GitHub Actions either not relevant or somewhat misleading.

sabinevidal commented 2 years ago

@lacabra @nathanfletcher: So looking through how the data is populated into the Intake form (sheet_1) Google sheet, I realised it might be worth changing the 3rd column 'intent' to something else to better describe what the 'yes'/'no' options are, as they are not the intent of the text, but rather checking if the intent detected by the API is correct (as on the website). And then have a 4th column with the intent label from the API? Which will then be mapped to the output sheet when anonymised. This is out of the scope of this issue as it's related to the kindly-website repo, but not sure if it's necessary?

lacabra commented 2 years ago

@sabinevidal: the 3rd column is intentionally designed the way it is, because what it records is whether the user believes there is a cyberbullying intent in that sentence, regardless of what the API marked as. This is designed this way because that is the structure of training data: sentence and 0 / 1 for whether there is cyberbullying intent to train the algorithm.

What is not apparent is how that column is populated, as the front-end interface does some conversion, see table below (with links to where the logic is encoded):

API detects cyberbullying -> user agrees -> populates Yes on 3rd column API detects cyberbullying -> user disagrees -> populates No on 3rd column API does not detect cyberbullying -> user agrees -> populates No on 3rd column API does not detect cyberbullying -> user disagrees -> populates Yes on 3rd column

sabinevidal commented 2 years ago

@lacabra That makes more sense - will add a comment to make sure that distinction is clear. And could then map the yes/no to the 0/1 label so it's consistent along the backend. Many of the current inputs haven't been assigned an intent yet, as the yes/no isn't 'required', should this then be added when the data is anonymised if it's not there?

lacabra commented 2 years ago

@sabinevidal

should this then be added when the data is anonymised if it's not there?

Yes, we will add that manually when UNICEF staff reviews and anonymizes 👍

sabinevidal commented 2 years ago

Contribution Counter is now visible on the contribution page https://kindly.unicef.io/contribute Available as a component to add in other areas of the site.

Google Apps script set up to transfer contributions from the input sheet into the output sheet on Google Sheets. Workflow created to bring data from output sheet into the training data file on Github.

Reviewers notified to anonymise and review sheet after 20 new contributions.