hina86 / DM2017_UT

0 stars 0 forks source link

Write a page about how text mining is done? #1

Open salman- opened 7 years ago

salman- commented 7 years ago

@hina86 please complete this page. I did few things for you.

hina86 commented 7 years ago

First of all, i would like to explain why we need to perform text mining. for that let's look at the dataset. image The above image shows us the commit messages that are labeled as 0 (meaning not energy related). similarly look at the snapshot of data below. it show us some commit msgs that are labeled as 1 (means energy related) image

in both the images, we can see that the word power, voltage, power consumption, and energy etc are used. so just comparing the text with a fixed set of energy related terms is not enough. we have to identify the frequent occurring terms in energy-related commits ourselves and rank them according to their IDF (inverse document frequency). that how important is that term in a set of records. once we have TF-IDF (term frequency- inverse document frequency) than combining it with the target/label column we can use this information to train a classifier to distinguish between the energy related and non-energy related commits, so that we can label new data.

salman- commented 7 years ago

Thanks @hina86 ,but can u please insert it into the respected page. There are also, some questions to answer for u. Maybe, u have already answered some of them.

hina86 commented 7 years ago

okay sure. no problem, didn't see the page before