Open salman- opened 7 years ago
First of all, i would like to explain why we need to perform text mining. for that let's look at the dataset. The above image shows us the commit messages that are labeled as 0 (meaning not energy related). similarly look at the snapshot of data below. it show us some commit msgs that are labeled as 1 (means energy related)
in both the images, we can see that the word power, voltage, power consumption, and energy etc are used. so just comparing the text with a fixed set of energy related terms is not enough. we have to identify the frequent occurring terms in energy-related commits ourselves and rank them according to their IDF (inverse document frequency). that how important is that term in a set of records. once we have TF-IDF (term frequency- inverse document frequency) than combining it with the target/label column we can use this information to train a classifier to distinguish between the energy related and non-energy related commits, so that we can label new data.
Thanks @hina86 ,but can u please insert it into the respected page. There are also, some questions to answer for u. Maybe, u have already answered some of them.
okay sure. no problem, didn't see the page before
@hina86 please complete this page. I did few things for you.