headrun / SWIFT

2 stars 0 forks source link

Implement Forecasting in CC #131

Open jaffrinkirthiga96 opened 4 years ago

jaffrinkirthiga96 commented 4 years ago

To implement the Forecasting into CC

jaffrinkirthiga96 commented 4 years ago

Have checked the Memory usage taken by the Druid HealthKart Datasource in terms of Pandas Dataframes. HealthKart dataframes occupies 228764608 bytes (228 MB) of Memory. I Have checked with Pramod on Forecasting details, he needs to convert the Forecasting(Django app) into python and test. So may be today or tomorrow he ll give me that script to continue on this.

jaffrinkirthiga96 commented 4 years ago

On reading all the HealthKart records(40L+) from druid, the forecasting script is throwing Memory err like below. Note : The CC machine is of 4core and 16GB RAM 16GB RAM is insufficient to read the data and thus we need to move this to high configuration machine.

jaffrinkirthiga96 commented 4 years ago

Have loaded the RFR forecasted csv into druid and verifying the forecasted data. On verifying, I am seeing some data discrepancy between the Original and Forecasted Data. Also I have started creating the Dashboards in Superset. http://ccd.mie.one/superset/dashboard/9/ http://ccd.mie.one/superset/dashboard/10/

jaffrinkirthiga96 commented 4 years ago

Have created a new datasource HK_rollup in druid by removing the Gateway Id column(used RollUp feature in Druid). The dataset is now without duplicates and has 14L+ records. Note:Previous dataset has 40L+ records I am currently running the forecast script with the new dataset.

jaffrinkirthiga96 commented 4 years ago

Have checked and verified the RFR forecasted data with the reported usecases. Loaded the proper RFR data into druid and created sample Dashboards on top of it. http://ccd.mie.one/superset/dashboard/9/ http://ccd.mie.one/superset/dashboard/10/ Checking how to create join queries in the Druid datasource. Currently getting below error while joining the input and forecasted table. druid error: Resource limit exceeded (org.apache.druid.query.ResourceLimitExceededException): Subquery generated results beyond maximum[100000]

jaffrinkirthiga96 commented 4 years ago

Have completed the FbProphet Algo with the HealthKart dataset. It took around 21 hrs to complete its run. The forecasted output has been loaded into Druid and I am currently verifying the data reliability. http://druid.mie.one/unified-console.html#datasources