liamreilly01 / Sweng-Group-20

0 stars 0 forks source link

Explore Django framework #10

Open MylanaBulat opened 1 year ago

MylanaBulat commented 1 year ago

Django is a python framework for making a basic website.

break this into following tasks:

message @WilliamWalshDowd anytime about anything and he will help out.

Chat bot app = front end + back end

Back end:

https://www.djangoproject.com/start/

https://www.geeksforgeeks.org/django-tutorial/

Django has a default front-end for the backend that should be good enough to take user input from a text box and feed it to your python scripts.

In your backend you are going to be running some code on the user input. That may be scikitlearn text preprocessing and then taking that output and feeding it to a trained ML object to make a prediction about what the person is looking for. Here is a scenario based on the project description: User: I was made redundant

Django Back-end:

Preprocessing user input -> keywords = [made, redundant] -> vecwords = [word2vec(keywords)] Check out https://towardsdatascience.com/word2vec-explained-49c52b4ccb71 you can use it in https://orangedatamining.com/ and https://scikit-learn.org/stable/ Make prediction -> predict(vecwords) using a trained scikitlearn object-> returns a url(s) for the law/statutes that the system thinks are most pertinent

To do this, you could scrape loads of law/statute content, pre-process it to create a ‘bag-of-words’, convert using word2vec(), and save each URL you have scraped in a CSV, or JSON, or in a database as:

[{“url”:”http://example.com/”, “word2vec”:[238462936,…]}, {“url”:http://example2.com/, “word2vec”:[3894,…]}, {“url”:http://example3.com/, “word2vec”:[39503,…]}, …]

So what you are trying to do is ‘predict’ the url, based on the word2vec of the bag of words. Maybe word2vec isn’t necessary; you could use: https://towardsdatascience.com/text-vectorization-bag-of-words-bow-441d1bfce897#:~:text=Bag%20of%20words%20is%20the,a%20word%20in%20a%20sentence.&text=The%20values%20corresponding%20to%20each,a%20word%20in%20a%20review.

So your user input is “word2vec”, and you don’t know what the URL is supposed to be. The simplest algorithm I can think of to make a prediction here is to use kNN https://www.datacamp.com/tutorial/k-nearest-neighbor-classification-scikit-learn

All it does is compare the user input word2vec to the database of word2vecs and urls. The closest match will be the prediction, and you just give the URL back to the user.

Once you have something like this working, it is easy enough to plugin different, more advanced ML models, like random forest, etc.

will commented 1 year ago

message @will anytime about anything and he will help out.

If you want to email me, you can I guess. But you're going to be disappointed I think. I've never used django and I've only barely used python.

MylanaBulat commented 1 year ago

@will I am so sorry for tagging you here< one of our group mates is also will, used the wrong tag. my apologies

mccabed7 commented 1 year ago

https://djangoforbeginners.com/
As of 27/2/23, I have been added to the Django backend team, as most of the work on the web-scraping team has been finished. I'm reading through the guide above to familiarize myself with the framework, and this week I'll look at the code that people have already written.