unicef / publicgoods-roadmap

Technical Roadmap for UNICEF's work on Digital Public Goods
Creative Commons Attribution 4.0 International
4 stars 0 forks source link

Exploring ML for building a more robust and scalable version of Kindly #57

Open nathanfletcher opened 3 years ago

nathanfletcher commented 3 years ago

Looking into ways to achieve what Kindly #41 does to make it better. This may also result in solutions that are not vendor-locked. Maybe @nathanbaleeta may have a few ideas.

lacabra commented 3 years ago

@nathanfletcher: can you document here some of the findings? Thanks! 🙏

nathanfletcher commented 3 years ago

I will start here with the basics from discussions with @nathanbaleeta

A number of things I'll be looking into:

nathanbaleeta commented 3 years ago

PROBLEM DEFINITION The use of Twitter and social networking sites (SNS) such as Facebook to communicate with one another and the world, has led to increased instances of cyberbullying, especially among teenagers. (Reference)

Twitter is an American microblogging and social networking service on which users post and interact with messages known as "tweets". Registered users can post, like, and retweet tweets, but unregistered users can only read them. (Wikipedia)

Cyberbullying is the use of information and communication technology to harass and harm in a deliberate, repetitive, and hostile manner.

Types of cyberbullying include bullying someone through social media, harassment, sexting, cyberstalking, deception, impersonation, and sending nasty messages via chat rooms and instant messenger. Here are more examples of cyberbulling.

According to Twitter demographics published by www.statista.com as of April 2021: users aged less than 24 years old were almost the 24 percent worldwide as shown below in the graphic: statistic_id283119_twitter_-distribution-of-global-audiences-2021-by-age-group

SOLUTION To solve this problem, we will follow the typical machine learning pipeline. We will first import the required libraries and the dataset. We will then do exploratory data analysis to see if we can find any trends in the dataset. Next, we will perform text preprocessing to convert textual data to numeric data that can be used by a machine learning algorithm. Finally, we will use machine learning algorithms to train and test our sentiment analysis models

nathanfletcher commented 3 years ago

@lacabra This repository is where my files and practical learnings are https://github.com/nathanfletcher/ml_text_classification

amreenp7 commented 2 years ago

@nathanfletcher to include this in documentation before closing it.