sirin05137 / CSE364_Project

2 stars 0 forks source link

[FINAL] Update README.md #27

Closed yuujinleee closed 3 years ago

yuujinleee commented 3 years ago

지금 내 브랜치 로컬에서 수정 중이라 혹 궁금할까봐 중간중간 여기에 복붙해놓을게

ㅇㅅㅇ 💻

CSE364 Group11

Table of Contents

About the Project

This project is for the capstone project in SW Engineering (CSE364) at UNIST, developed by Yeongjun Kwak, Sanghun Lee and Yujin Lee. This will be further inplemented as a movie recommendation system. Currently under the development···

Built with

Repository Structure

CSE364_Project/
├── data/                             # Source data
├── scripts/                          # Documentation files
├── src/                              # Source files
│   ├── main/
│   │   └── java/
│   │       └── group11/
│   │           └── project.java
│   └── test/
│       ├── java/
│       │   └── group11/
│       │       └── projectTest.java
│       └── resources/   
├── target                            # Generated when mvn install
│   └── jacoco-report/
│   │       └── index.html            # JaCoCo Code Coverage Report                
├── pom.xml                           # Maven configuration
└── README.md

Getting Started

Instructions for setting up the project locally.

Prerequisites

Installation

  1. Download dockerfile and run.sh in the same directory.
  2. (In the terminal) Build Docker Image and Container by running the following commands.
    $ docker build -t new_image_name /path/to/dockerfile
    $ docker run -it new_image_name
  3. (In the Docker Container) Run . run.sh
    root@containerID:~/project# . run.sh

Milestone 1

Goal : Environment setup, data loading, data preprocessing

Explanation of the Algorithm

On Milestone 1, the code calculates and returns the average rating from ratings data for specified occupation and genre.

Running the Test

Continued from Installation steps.

  1. (In the Docker Container) Run Java command with 2 arguments (InputStr1 InputStr2), accordingly Genre and Occupation input.
    root@containerID:~/project# java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project InputStr1 InputStr2

Examples

When valid inputs are passed, the output message will look like this :

// Input
java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project Adventure Educator
// Output
3.42
// Input
java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project Animation Doctor
// Output
3.68

Supported Inputs

Genre total:18
Action Crime Horror Thriller
Adventure Documentary Musical War
Animation Drama Mystery Western
Children's Fantasy Romance
Comedy Film-Noir Sci-Fi
Occupation total : 21
Academic (Educator) Executive (Managerial) Programmer Technician (Engineer)
Artist Farmer Retired Tradesman (Craftsman)
Clerical (Admin) Homemaker Sales (Marketing) Unemployed
Collegestudent (College, Grad, Gradstudent) K-12student Scientist Writer
Customerservice Lawyer Self-employed Other
Doctor (Healthcare)

Rules for the Inputs

Combination of Multiple Genres as an Input

Error Codes

Possible errors thrown by invalid user input.

Table 1 Invalid input errors
Error Code Message Description
InputEmptyError 1 No argument has passed. 2 arguments are required. (InputStr1 InputStr2) Thrown when no input has entered.
InputNumError 2 Only 1 input has passed. 2 arguments are required. Thrown when only 1 input has entered.
InputNumError 3 More than 2 arguments have passed. 2 arguments are required. Thrown when more than 2 inputs have entered.
InputInvalidError 4 Entered genre input is invalid Thrown when the entered genre (combination) is invalid.
InputInvalidError 5 Entered genre (inputString) doesn't exist. ( Invalid word : input_string ) Thrown when the word in the entered genre (OR the word in genre combination) is invalid.
Table 2 Invalid input warning
Warning Code Message Description
InputInvalidWarning 6 Entered occupation doesn't exist. Rating by 'other' is shown instead. Thrown when the 2nd input is invalid.
Table 3 No data exist error
Error Code Message Description
NoDBError 7 Rating data matching the input pair doesn't exist. Thrown when there's no available Rating data for the genre-occupation pair OR When there's no Movie data matching the entered genre (combination).

Examples for the Error Codes

Error code : 1~3
% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project 

InputEmptyError : No argument has passed. 2 arguments are required. (InputStr1 InputStr2)
Error code: 1

% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project Drama

InputNumError : Only 1 input has passed. 2 arguments are required.
Error code: 2

% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project Drama Scientist Scientist

InputNumError : More than 2 arguments have passed. 2 arguments are required.
Error code: 3
Error code: 4
% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project "|Adventure|Action" Scientist          

InputInvalidError : Entered genre input is invalid.
Error code: 4

% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project "Adventure|Action|" Scientist 

InputInvalidError : Entered genre input is invalid.
Error code: 4
Error code: 5
% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project "Adventure|Action|Horrrror" Scientist

InputInvalidError : Entered genre (horrrror) doesn't exist.
Error code: 5
Error code: 6
% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project Adventure Librarian                  

InputInvalidWarning : Entered occupation doesn't exist. Rating by 'other' is shown instead.

The rating of adventure rated by other : 3.43
Error code: 7
% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project "War|Crime" Academic  

NoDBError : Rating data matching the input pair doesn't exist.
Error code: 7

% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project "Action|Animation|Children's|Sci-Fi|Thriller|War" retired

NoDBError : Rating data matching the input pair doesn't exist.
Error code: 7

About Junit Test

The Junit test (and regarding csv test resources) for Milestone 1 has been moved to scripts/. For more information, please refer to this issue #20

Contribution by Area

Area Contribution
Setting up a Git Repository Yeongjun Kwak
Environment Setup dockerfile, pom.xml
Yujin Lee
Java Implementation Models and Data Structures
👑 Yeongjun Kwak
Exception Handling
👑 Sanghun Lee, Yeongjun Kwak
Unit Test Building
Yujin Lee
Final Reviewer
Sanghun Lee, Yeongjun Kwak
Documentation README.md
👑 Yujin Lee, Sanghun Lee



Milestone 2

Goal : Implement Top 10 Movies Recommendation Features

Explanation of the Algorithm

On Milestone 2, the code returns the recommendation of Top 10 movies for specified gender, age, occupation or genre(s). First, to recommend 'relevant' movies, the code makes use of 1) Bayesian Estimate, which is used to calculate Top 250 movies by IMDB as well, when calculating and comparing the ratings of movies. Also, to set the 'similar' users (in case there aren't enough ratings that match gender, age and occupation), we have set 2) Priority rule for including similar users.

1) Bayesian Estimate

Bayesian Estimate is an estimator that can help minimizing the risk of including that minimizes the posterior expected value of a loss function. By making use of Bayesian Estimate, the algorithm calculates Weighted Rating (W) and arranges movies with W. In this way, the movies with very few ratings or below-average ratings will have comparably light weight.

In Detail, the calculation of Weighted Rating(W) is implemented by

The original reference for Baysian Estimate can be found here. However, in this project, the estimation method and variables has been set differently to adjust the differences in requirements.

W = (vR+mC)/(v+m)

Variables Explanation
W Weighted rating
v Total number of ratings for the movie = (votes)
m Minimum number of ratings required to be listed in the Top 10
R Average rating for the movie as a number from 0 to 5 (mean)
C Average rating across all the movies

2) Priority rule for including similar users

The algorithm firstly makes the ArrayList(valid_user_list) of users that matches the inputs from users.dat. And then, this list is used to extract the ratings information and movie data from ratings.dat and movies.dat.

However, when there aren't sufficient amount of movie candidates to be ranked (On here, it is set to 100 movies) for specified user data, the 'similar' users will also be added to valid_user_list in order of precedence (priority) by function make_intersection_list_macro, until the number of movie candidates gets bigger than 100.

The similar users with priority are the users with :

  1. Same Gender, Occupation and Gender
  2. Same Gender and Occupation
  3. Same Age range and Occupation
  4. Same Occupation
  5. Same Gender and Age range
  6. Same Gender
  7. Same Age range
  8. All the users

The priority has set as above to give a more weighting on Occupation, and less on Gender and Age Range.


Running the Test

Continued from Installation steps.

  1. (In the Docker Container) Run Java command with 3 or 4 arguments. The Arguments(InputStr1 InputStr2 InputStr3 InputStr4) are accordingly Gender, Age, Occupation, (and Genre).
    root@containerID:~/project# java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.milestone2 InputStr1 InputStr2 InputStr3 InputStr4

Examples

When valid inputs are passed, the output message will look like this :

Testing with 3 inputs
// Input
java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.milestone2 "F" "25" "Gradstudent"

// Output
Sixth Sense, The (1999) (http://www.imdb.com/title/tt0167404)
Matrix, The (1999) (http://www.imdb.com/title/tt0133093)
Shawshank Redemption, The (1994) (http://www.imdb.com/title/tt0111161)
Usual Suspects, The (1995) (http://www.imdb.com/title/tt0114814)
Silence of the Lambs, The (1991) (http://www.imdb.com/title/tt0102926)
Close Shave, A (1995) (http://www.imdb.com/title/tt0112691)
Wrong Trousers, The (1993) (http://www.imdb.com/title/tt0108598)
Cinema Paradiso (1988) (http://www.imdb.com/title/tt0095765)
American Beauty (1999) (http://www.imdb.com/title/tt0169547)
Raiders of the Lost Ark (1981) (http://www.imdb.com/title/tt0082971)
Testing with 4 inputs
// Input
java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.milestone2 "F" "25" "Gradstudent" "Action|Comedy"

// Output
Matrix, The (1999) (http://www.imdb.com/title/tt0133093)
Close Shave, A (1995) (http://www.imdb.com/title/tt0112691)
Wrong Trousers, The (1993) (http://www.imdb.com/title/tt0108598)
American Beauty (1999) (http://www.imdb.com/title/tt0169547)
Shakespeare in Love (1998) (http://www.imdb.com/title/tt0138097)
Raiders of the Lost Ark (1981) (http://www.imdb.com/title/tt0082971)
Cinema Paradiso (1988) (http://www.imdb.com/title/tt0095765)
Eat Drink Man Woman (1994) (http://www.imdb.com/title/tt0111797)
Raising Arizona (1987) (http://www.imdb.com/title/tt0093822)
Breakfast Club, The (1985) (http://www.imdb.com/title/tt0088847)

Supported Inputs

InputStr1 InputStr2 InputStr3 (InputStr4)
Gender Age Occupation (Genre)
Testing with 3 inputs
// Examples of Supported Inputs
“F” “25” “Gradstudent”
“M” “30” “Athletes”
“F” “” “retired”
“” “15” “”
“” “” “”

// Examples of Wrong Inputs
"F" "25"           // Empty part must be explicitly specified by ""
"25" "Gradstudent" // Empty part must be explicitly specified by ""
Testing with 4 inputs
// Examples of Supported Inputs
“F” “25” “Grad student” “Action|Comedy”
“M” “30” “Athletes” “Children’s”
“F” “” “retired” “Animation|Drama|Fantasy”
“” “15” “” “Sci-Fi”
“” “” “” “Romance|Comedy”

// Examples of Wrong Inputs
“F” “25” “Grad student” “”  // The genre must not be left empty.
“F” “25” “Grad student” “Action|Comedy” "Drama" // Extra arguments are not allowed.


Error Codes

Possible errors thrown by invalid input.

Table 1 Invalid input errors
Error Message Description
InputNumError The input must be in this format : "gender" "age" "occupation" "genre" (genre is optional). Thrown when the number of argument is not 3 or 4.
InputInvalidError Entered gender input is invalid. Thrown when the entered gender is invalid.
InputInvalidError Entered age input is invalid. Age must be a natural number. Thrown when the entered age is invalid.
InputInvalidError Entered occupation (inputString) doesn't exist. Thrown when the entered occupation is invalid.
InputInvalidError Entered genre input is invalid. Thrown when the entered genre is invalid.
InputInvalidError Entered genre (inputString) doesn't exist. Thrown when the entered genre is invalid.
InputEmptyError Genre input hasn't passed. Genre must not be empty Thrown when the "" is passed for the genre input.
Table 2 No data exist error
Error Message Description
NoDBError There are no movie available for more than m votes Thrown when there's no available movie list. This error is not likely to happen but implemented to prevent any error.
NoDBError There are no movie available for specified inputs. Thrown when there's no available movie list or there exist movie(s) with no votes only. This error is not likely to happen but implemented to prevent any error.

Examples for the Error Codes

여 기 수 정 ! !! ! !

Error code : 1~3
% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project 

InputEmptyError : No argument has passed. 2 arguments are required. (InputStr1 InputStr2)
Error code: 1

% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project Drama

InputNumError : Only 1 input has passed. 2 arguments are required.
Error code: 2

% java -cp target/cse364-project-1.0-SNAPSHOT-jar-with-dependencies.jar group11.project Drama Scientist Scientist

InputNumError : More than 2 arguments have passed. 2 arguments are required.
Error code: 3


Junit 추가 !! ! ! ! !

Contribution by Area

Area Contribution
Java Implementation Models and Data Structures
👑 Yeongjun Kwak
Exception Handling
👑 Sanghun Lee, Yeongjun Kwak
Unit Test Building
Yujin Lee
Environment Setup Maven Dependancy
Yujin Lee
Documentation README.md
👑 Yujin Lee




Milestone 3 (Upcoming)

Milestone 4 (Upcoming)

Team Members

License & Acknowledgements

yuujinleee commented 3 years ago

"F” Must be enclosed with double quotation mark 공백은 " " "" 둘다 허용

yuujinleee commented 3 years ago

For README, please include the followings in this milestone:

  1. A short description of how your recommendation algorithm works. In case you are using an existing algorithm, describe what algorithm you used and how it works.

  2. A description of how to run your program. An example of java command line will be good.

  3. A description of supported inputs and expected outputs when inputs are not supported.

  4. Roles of each member (i.e. who did what?)

yuujinleee commented 3 years ago

junit system exit 관련 외부 소스 사용 출처 쓰기

https://github.com/tginsberg/junit5-system-exit

yuujinleee commented 3 years ago

Error code 넘버링은 이제 제외

yuujinleee commented 3 years ago

Final Review on yujin branch by @sirin05137 and @sanghun17

최종 결과

image

image

image