p-society / gc-server

Stay updated in real-time and engage with the thrill of the game like never before.[WIP]
Apache License 2.0
3 stars 6 forks source link

[FEAT]: Creation of ML model regarding win prediction of our GC cricket tournament. #31

Open geekofycoder opened 7 months ago

geekofycoder commented 7 months ago

So the idea is that we have seen the win probability of google during the IPL and international matches. How about we create a same system but its for our own college and GC!

zakhaev26 commented 7 months ago

Nice idea.Have you thought anything about the implementation? Or we can roll about it here CC: @majorbruteforce @punitkr03

geekofycoder commented 7 months ago

Yes absolutely.The implementation would be using the ML algos but I was asking GPT and it suggested Kafka for streaming services.Can you brief or explain how Kafka helps in real time analysis

geekofycoder commented 7 months ago

For ML algo Random Forest is something that is coming to my mind and also suggests by GPT and some videos. Suggestions for optimisation?

zakhaev26 commented 7 months ago

Suggestions for optimisation?

Currently the ones who are working on this project ain't having any idea on ML :p

I would recommend to bring Anshuman (Sophomore,IT) to include in this thread/anyone from PSoc/ML to discuss on this as they have greater knowledge on ML stuff.

punitkr03 commented 7 months ago

Look for a way to implement random forest ML algorithm. It fiits our use-case well. Also it is less prone to overfitting which will be more accurate in less amount of training data. @geekofycoder

zakhaev26 commented 7 months ago

Yes absolutely.The implementation would be using the ML algos but I was asking GPT and it suggested Kafka for streaming services.Can you brief or explain how Kafka helps in real time analysis

Kafka is a very complex thing at large scale,for eg if you have a huge amount of data to be processed by your backend servers/ML models/any resource intensive worker,Handling data at scales of Millions/sec would be hard,for eg,your db would be massacred due to insertion of so much throughputs of data in such a short time. Kafka helps by managing this through using a Queue / DLQ based Publication-Subscription model that helps in distributing the workload and allowing things to work without going down as a response to so much data ingestion at once.

This is a very small idea on Kafka,but I would suggest to watch this video that have a real world example of Kafka use + Tutorials on understanding kafka better

zakhaev26 commented 7 months ago

@punitkr03 bhaisahab aap ML me kab ghus gaye?

punitkr03 commented 7 months ago

@zakhaev26 Suffering from skill issue.

majorbruteforce commented 7 months ago

So the idea is that we have seen the win probability of google during the IPL and international matches. How about we create a same system but its for our own college and GC!

This would be great in my opinion. Let's get the MVP rolling!

majorbruteforce commented 7 months ago

@zakhaev26 are we going to open a new repo for ML development or will it be under gc-server?

geekofycoder commented 7 months ago

@punitkr03 Suffering from Success syndrome on its way 😁

zakhaev26 commented 7 months ago

@zakhaev26 are we going to open a new repo for ML development or will it be under gc-server?

It would be a good separation of concern if we create a new repository for it,but it shouldn't be just another dead part...I want ML guys to contribute there.Either way works...here also we can work on different branch..up to you guys

geekofycoder commented 7 months ago

@zakhaev26 let's discuss here and let me understand contributions part from you then with other members I will say what to do.

geekofycoder commented 7 months ago

[REF]https://docs.aws.amazon.com/msk/latest/developerguide/mkc-create-topic.html creating ec2 instance and installing kafka on ec2 machine

uraharaSky commented 7 months ago

What about the datasets....like what kind of data sets are we looking at?

geekofycoder commented 7 months ago

The dataset as I mentioned will be initially a dummy dataset but we will be creating custom dataset by deriving it from the dummy ones..... it's gonna take a lot of time so it is kept at last phase . Initially the model that is present on internet is manual one i.e. we have to feed the remaining balls reqd run rate etc....but we want it to be a real time dynamic application