recsyschallenge / 2017

40 stars 24 forks source link

baseline have some problems #6

Closed abdollahpouri closed 7 years ago

abdollahpouri commented 7 years ago

Hello,

The baseline code have some problems. 1) The main one was in the recommendation_worker.py file line 36 the variable num_evaluated was not defined. I assume you mean num_evaluation. 2) Another problem was that in targetUsers.csv there was a header "user_id" and therefore in the xgb.py file line 69 an error occurs since int("user_id") is not decimal. 3) And also in recommendation_worker.py line 36, len(average_score) is not defined as average_score is defined as a float variable and it does not have length.

Could you please check the code and run it to see if it generates results correctly.

Thank you very much, Himan

wdroz commented 7 years ago

Sorry, my bad for 1 and 3. I send a pull request for solve theses 2 points.

dkohlsdorf commented 7 years ago

part 2) is my bad ... i just deleted the header

abdollahpouri commented 7 years ago

Thank folks, I actually fixed 1 and 2 myself last night as I wanted to run the code to see how it works but I wasn't sure about 3.

A couple of more things:

4) In model.py line 44 and 50, why did you check for self.item.indus != 23 and self.item.disc != 23? What is exactly 23?

5) In model.py line 51, shouldn't it be "return 1" instead of "return 2"? Maybe I am wrong, though.

Thank you very much

dkohlsdorf commented 7 years ago

The check for 23 is deleted now (see the other issue). And no since I regarded the discipline match more important. If you use XGBoost it wont matter but for other algorithms it will.

Daniel

abdollahpouri commented 7 years ago

Thanks Daniel. I have another question which might seem a little silly :)

The code is still running now but it might take a while, so in the meantime, could you please tell me why you create 5 different solution csv files? which one is going to be the final solution?

Thanks

dkohlsdorf commented 7 years ago

The work is split into 5 workers. Each worker produces a result. If you change it to 10 workers, the tool will run even faster and you have 10 files. In the end you have to concatenate all the results. In unix you can do: cat solution_*.csv > solution.csv

abdollahpouri commented 7 years ago

Thanks again.

ayoubmtd commented 7 years ago

Hello Everyone,

I am still getting user_id invalid literal.

Traceback (most recent call last):
  File "xgb.py", line 71, in <module>
    target_users += [int(line.strip())]
ValueError: invalid literal for int() with base 10: 'user_id'
abdollahpouri commented 7 years ago

@ayoubmtd check your Targetusers.csv file to see if it still has the header. If it does, you should delete the header.

kimquy06 commented 7 years ago

@dkohlsdorf How much baseline solution achieve on the leaderboard? p/s: Want to know if anything wrong with my submission.

afcarvalho1991 commented 7 years ago

@kimquy06 achieves 10004 points, I think...?

André.