hritikksingh / Twitter-video-emotion-and-sentiment-analysis

26 stars 19 forks source link

Tweet preprocessing/cleaning before modelling #GSSOC'22 #21

Open hemhemoh opened 2 years ago

hemhemoh commented 2 years ago

What are you suggesting?

I am suggesting deep and thorough cleaning and preprocessing of the text before modeling it. I'd appreciate if this is assigned to me. I am also a member of GSSOC'22

Any screenshots?

hritikksingh commented 2 years ago

@hemhemoh Do you have a dataset to clean and train on? Currently, I have used a kaggle dataset of tweets.

hemhemoh commented 2 years ago

@hemhemoh Do you have a dataset to clean and train on? Currently, I have used a kaggle dataset of tweets.

I plan on using this same data. I'm not sure if they are punctuations in it but at least the text do not have a uniform case. I.e they are not all in lower case. And then stopwords were not removed neither were the words lemmatized

hritikksingh commented 2 years ago

@hemhemoh I agree. You can start working on the issue.

hemhemoh commented 2 years ago

@hemhemoh I agree. You can start working on the issue.

Okay, thank you. If you you want another data, I can create another issue for that and you assign it to me too. You've not added level to this issue too

hritikksingh commented 2 years ago

@hemhemoh I agree. You can start working on the issue.

Okay, thank you. If you you want another data, I can create another issue for that and you assign it to me too. You've not added level to this issue too

Thanks for pointing out, I have given the labels to the issue.

hritikksingh commented 2 years ago

@hemhemoh I agree. You can start working on the issue.

Okay, thank you. If you you want another data, I can create another issue for that and you assign it to me too. You've not added level to this issue too

Do you have another good and valid source of data? But having data as a separate issue will not be a good idea, instead try to use that data and train on it to get better accuracy, I have made this issue a level2, so it will justify your work.

hemhemoh commented 2 years ago

@hemhemoh I agree. You can start working on the issue.

Okay, thank you. If you you want another data, I can create another issue for that and you assign it to me too. You've not added level to this issue too

Do you have another good and valid source of data? But having data as a separate issue will not be a good idea, instead try to use that data and train on it to get better accuracy, I have made this issue a level2, so it will justify your work.

Okay, thank youuuuu