nikwilms / ESG-Score-Prediction-from-Sustainability-Reports

This repository contains code and data for a machine learning model that predicts ESG (Environmental, Social, and Governance) scores based on sustainability reports and company data. It's a valuable resource for researchers, investors, and sustainability professionals interested in ESG score prediction using machine learning techniques.
MIT License
15 stars 2 forks source link

Tokenization and Normalization #9

Closed mariusbosch closed 10 months ago

mariusbosch commented 10 months ago

Tokenize the text into words or phrases. Convert all text to lowercase. Remove stop words (common words like 'and', 'the', 'is', etc. that don't add significant meaning in analysis). Apply stemming or lemmatization to reduce words to their base/root form.