pwnslinger / Cyberbullying-NLP

conference paper;
0 stars 0 forks source link

random sampling #4

Closed pwnslinger closed 4 years ago

pwnslinger commented 4 years ago

@sogol-golafshan
http://arno.uvt.nl/show.cgi?fid=148294

Please read this one and make you understand what's going on.

pwnslinger commented 4 years ago

https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html

sogol-golafshan commented 4 years ago

Random Text for human annotator: https://drive.google.com/drive/u/0/folders/1qN8i1pZpZ1RVGoy5QhNmjE2YxoYnDyjD

sogol-golafshan commented 4 years ago

Readability: “the ease of understanding or comprehension due to the style of writing.”

Here's a python package that assesses the readability of a given text: py-readability-metrics (https://github.com/cdimascio/py-readability-metrics#flesch-reading-ease). This package uses today’s most popular readability metrics. Some include: FRES, SMOG, Gunning Fog,etc.

FRES uses two variables to measure readability: -the average length of your sentences measured by the number of words. (Too many long sentences make your text difficult to read.) -the average number of syllables per word. (Words with four or more syllables are considered difficult to read)

sogol-golafshan commented 4 years ago

Extracting Hashtags out of tweets: DS1: https://drive.google.com/file/d/19mFN3dYdd4D5qDbOmuYzlKyEQiUYYlKF/view?usp=sharing DS2: https://docs.google.com/spreadsheets/d/1Zkv-u0xd0JPuShAQomAHWEdmB1-xEo0xJySGw9PTsls/edit#gid=574239134 DS3: https://drive.google.com/file/d/1oOVKZ77ocm85J6quJFhMo9-pdUGtTSof/view?usp=sharing DS4: https://drive.google.com/file/d/1b0K1VJ-zzaRxiogXXa4ogsPGLtJ9Hc4W/view?usp=sharing DS5: https://drive.google.com/file/d/1BpqMCqzRjXUF9nTGhMrwZxUuPtrer-_n/view?usp=sharing