sindat / Furcifer

Download Youtube videos as .mp3 .
MIT License
0 stars 0 forks source link

Integrate Watson AI #6

Open sindat opened 5 years ago

sindat commented 5 years ago

Watson will learn from the collected user input, and will later start suggesting relevant search data (autofill) .

That will make it easier for the user to download music they want and in a more organised way.

sindat commented 5 years ago

Use Watson Studio in IBM cloud

sindat commented 5 years ago

Steps for implementing Watson Discovery cognitive insights:

Because I am using Watson Discovery API for this. These are the steps for implementing it into my web app. It also seems that the documents being processed (the training data is gonna be stored in the IBM Cloud aswell).

sindat commented 5 years ago

IBM Cloud Object Storage - this is how I'm planning to feed data from my application (hosted on IBM cloud) into Watson Discovery.

sindat commented 5 years ago

Use the Watson Discovery API to add a data collection and documents to my Discovery instance running in IBM Cloud as a service . This will be triggered by user entering data on the website (youtubetomp3 downloader).

The website

sindat commented 5 years ago

Update - Discovery will not be used, I am using Watson Studio, building my own model.

I will need a proper machine learning algorithm.

I have also already made a design scheme for the website to work well with Watson (it is an overall improvement).

Now in the process of setting up my Watson environment - this is the stack I will be using: Jupyter Notebooks - to clean, visualize and understand collected data Apache Spark - cluster computing platform for analyzing massive amounts of data Apache Spark ML - Spark ML library for building ML pipelines and algorithms for supervised and unsupervised AI learning - in my case it's supervised - training the model with collected user input IBM Watson ML - deploy ML models and make predictions at runtime - will be accessed through API

This might be a deprecated solution - seems most of the stuff is integrated in watson studio. Check on it.

sindat commented 5 years ago

UPDATE

So apparently, the watson studio since april includes jupyter notebooks.

It probably includes all the algorithm stuff too which I'd use apache spark for, so it's now all in one.

First, learn by the old tutorial from Medium, then use the tools already integrated in Watson. That way I will understand the process and will be able to apply it using the studio.

sindat commented 5 years ago

ANOTHER UPDATE

So apparently, Spark is included in Watson studio aswell, in "Spark Environments" .

Use it AFTER I learn from the Medium tutorial first (as I already mentioned) - https://medium.com/codait/building-your-first-machine-learning-system-b3d9401927b7

sindat commented 5 years ago

UPDATE ON DATA FORMAT

For now, upload .csv files.

SquareFeet,Bedrooms,Color,Price 2100,3,White,100000 2300,4,White,125000 2500,4,Brown,150000

This shall be the format.

Upload the data to my github repo.

sindat commented 5 years ago

I will need the URL of the raw .csv data.

sindat commented 5 years ago

For now I can use my repository as data storage, soon I will use my ibm cloud to store data related to my web application

sindat commented 5 years ago

UPDATE

The flow in Watson Studio is :

  1. Create notebook - it is executable python code using Spark, cause I run it in an environment that utilizes spark.
  2. I have copied a notebook from this guide: https://medium.com/codait/building-your-first-machine-learning-system-b3d9401927b7 The notebook I'm copying is here: https://dataplatform.cloud.ibm.com/analytics/notebooks/3e83ffa1-f52a-4b76-bbb5-498b6b7f9505/view?access_token=a7dfdd01dbc24c53a5ac9688fbdd32da1b59156117d721fe10d12660f18dd591
  3. Replicate this notebook into my own one.
  4. The notebook contains where it takes the raw data from, which in my case I will be uploading to my github repo.
  5. It also includes instructions about building the machine learning model.
  6. It includes instrucitons about deploying the ML model via Watson.
sindat commented 5 years ago

UPDATE ON HOW TO RUN THE CONFIGURED NOTEBOOK

The notebook can be setup to run as a job, every hour and stuff, to do analysis on the input data. The input data, however, will be updated everytime the user performs input.

sindat commented 5 years ago

UPDATE ON FIRST DATA IMPORT INTO NOTEBOOK

Data import went succesfully. Data displayed with pixiedust as a Spark DataFrame. This DataFrame will be used to train the deployed Watson ML model.

sindat commented 5 years ago

UPDATE ON WHICH ALGORITHM I SHOULD BE USING

This is a case of supervised learning. Labeled data is provided and I want to get suggestions based on patterns in the data entered by the user (combinations, frequency etc.)

This is probably more a classification than regression problem, since I'm not estimating a whole number.

I'm estimating user entered string data based on previous input by the user.

sindat commented 5 years ago

UPDATE ON CHOSEN SUPERVISED ALGORITHM

I will be using a Classification algorithm. Regression in supervised learning means predicting a numerical value. Here I am classifying data and predicting chosen values, influenced by other variables.

sindat commented 5 years ago

UPDATE - NEW POSSIBLE OPTIMIZED PROCESS

sindat commented 5 years ago

UPDATE ON USED ALGORITHM

I am using the Naive Bayes machine learning algorithm for Furcifer.

It is used for making recommendations based on counts and patterns found in the Spark Dataframe.

sindat commented 5 years ago

UPDATE ON ALGORITHM

I am using the multi-class classification method.

Because I'm predicting the artist, release date and album in which the song is included.

The goal: combine those three to make a search query using the youtube API and list suggested videos

sindat commented 5 years ago

UPDATE ON ALGORITHM

After all, I'm gonna use the random forest classification algorithm.

sindat commented 5 years ago

UPDATE - CHANGING THE WHOLE SCHEME

Deployment and modelling stays the same.

However, there will be a change in the dataset and the label being searched for.

The user will be required to provide information about themselves. Then, they will be required to provide the genre of what they're looking for, and from which period of time it is. All form fields will be required.

Without entering this data, the search bar will not appear. Only after the last form is posted, the bar to enter URL will appear.

After URL is entered, suggestions appear.

The label is in this format: "90s rock, 00s pop, 80s disco" This label, which is returned as a prediction, is used as a query for the youtube API. Upon query, video list is returned by the youtube API.

Videos with thumbnails are posted for user to see during and after the download., under the bar where user enters URL.

Required form fields to enter, do not leave them open ended, provide radio button options:

sindat commented 5 years ago

Gender: Male / Female Age: Let user enter Occupation: Computers and Technology / Health Care and Allied Health / Education and Social Services / Arts and Communications / Trades and Transportation / Management, Business, and Finance / Architecture and Civil Engineering / Science / Hospitality, Tourism, and the Service Industry / Law and Law Enforcement / Government Purpose for download: Workout / Studying, / Romance / Eating / Cooking / Sleeping / Relaxing / Travelling / Working Relationship status: Married / Widowed / Divorced / Single Genre of downloaded song: Dance / Rock / Jazz / Dubstep / Blues / Techno / Country / Electro / Indie / Pop From which time is the downloaded song: ancient / 20s / 30s / 40s / 50s / 60s / 70s / 80s / 90s / 00s

sindat commented 5 years ago

UPDATE

Successfully deployed, model trained and in the cloud, kernel running as an active environment with python 3.6 with Spark.

REST API ENDPOINT FOR CALLING FOR EVALUATION https://eu-de.ml.cloud.ibm.com/v3/wml_instances/3820e5b1-2209-4ad9-9102-78dc616fa58e/deployments/1f7230e2-0d72-44fd-acef-37c908d482b5/online

I need to call this API with a jQuery AJAX POST method. Provide fields(features / factors), values(actual data being posted for prediction)

sindat commented 5 years ago

UPDATE

API call works, returns the prediction.