swsnu / bdcsfall2014

0 stars 0 forks source link

Programming assignment #2

Open taegeonum opened 10 years ago

taegeonum commented 10 years ago

Hi,

I have some questions about the programming assignment.

Q1) What kind of dataset do we have to use?

I'm not sure I understand correctly, but, in the programming assignment specification, there is no explanation about the data.

What kind of dataset do we have to use? Can we select a random dataset?
How much data do we need?

Q2) Grading

Thanks.

bgchun commented 10 years ago

@taegeonum I'm copying the email I sent out for the programming assignment announcement below. We assigned each student to a particular learning method and a learning goal. As shown in the table, you have to work on SGD and linear SVM.

I briefly mentioned dataset in my previous email. For the dataset, please take a look at the Langford paper (A Reliable Effective Terascale Linear Learning System) if you are working on SGD or LBFGS. For ALS, please take a look at the Yahoo! music dataset. These are suggestions. If you have other datasets you want to play with, you're welcomed to do so.

The grading will be done in multiple aspects.

I hope this answers your question.

==== Email I sent for the programming assignment announcement

다음 사이트에 올렸습니다.

https://sites.google.com/site/snubdcsfall2014/programming-assignment

각 사람마다 algorithm 과 goal을 assign한 테이블은 다음과 같습니다. 참조해서 자기가 해야 하는것을 찾아서 하면 됩니다!

programming assignment 는 10월 18일 6시가 마감입니다.

discussion 은 github issue tracking을 통해서 이루어 지니까, 각각 project 팀마다 repo명 (team명으로도 사용)과 팀에 소속된 학생들 github id를 빨리 보내 주세요.

project proposal은 화요일 6시가 마감입니다.

Good luck!

programming assignment 테이블

SGD LinearRegression 김현준 LBFGS LinearLogisticRegression 유강민 ALS LinearSVM 한만휘 SGD LinearLogisticRegression 나안수 LBFGS LinearSVM 박진우 ALS LinearRegression 안현기 ALS LinearLogisticRegression 이기석 SGD LinearSVM 조창연 LBFGS LinearRegression 한겨레 SGD LinearRegression 박지웅 ALS LinearSVM 오명원 LBFGS LinearLogisticRegression 한재현 LBFGS LinearSVM 김덕주 SGD LinearLogisticRegression 양영석 ALS LinearRegression 이윤성 SGD LinearSVM 엄태건 LBFGS LinearRegression 이계원 ALS LinearLogisticRegression 이우연

bgchun commented 10 years ago

I would also like to encourage students to share information on dataset and experiments done with the dataset.

taegeonum commented 10 years ago

@bgchun Thank you. I have one more question.

As far as I know, we need to choose values of parameters for machine learning algorithm. For example, we need to choose learning rate for stochastic gradient descent. The accuracy and performance of the algorithm depends on the value of parameters. Don't we have to consider how we choose the values of parameters? Can we just randomly select some values for the parameters?

And I found a site for machine learning dataset. https://archive.ics.uci.edu/ml/datasets.html?format=&task=cla&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=table Can I use a dataset in this site?

bgchun commented 10 years ago

@taegeonum You can refer to the reference papers to see how the parameter values are chosen. Or you can try out a few different values and see what happens and report the results of different parameter values.

Yes. You can use the datasets from the site. In fact, Jason has used them.