Train the ML models - Githubissues

Before the ML can predict the carbon emission projection of the company, the model has to be tested with sample data first.

However the current sample data used is not using actual carbon footprint of products of the project but rather using online carbon emission of Europe from the years 1750 to 2021 (271 rows). However one potential drawback is that the dataset may not be sufficient

The rationale behind is to ensure that the model can predict a reasonable output. If the sample data is nonsensical, the output prediction will not be useful at all

Source: https://ourworldindata.org/co2-dataset-sources

Screenshot of test dataset: Column A: Year Column B: Actual CO2 Emission from fossil fuels and industry (in tons)

After using the test dataset to test-train the ML Model (R^Squared: 0.9972, FastTreeRegression Algorithm) This is what the model predicts:

Years to Test starts from 1990 to 2020 with 5 year intervals These years were chosen to verify if the model can predict decreasing trends due to global efforts to reduce emission 2020 is also good year to test the model because of the pandemic. However, as compared to the other years tested, where the actual and predicted value differed by an insubstantial amount, the model didn't managed to predict the drastic drop

Based on the testing results, the model can be relied on to predict the carbon emission of Clean and Bright Company when they set a target carbon emission

jibai-kia / ICT2106_P1-6

Train the ML models #18