Closed Heermosi closed 3 years ago
@Heermosi have you found a specific bug to report?
Oh, yeah, our team worked on this branch for weeks to make it a working demo. And we discovered a lot of bugs, especially this one from [evaluation_runner_test.py]:
# figure out what to do with result ;)
self.assertLess(result['mae'], decimal.Decimal(1.7))
self.assertLess(result['pak'], decimal.Decimal(0.7))
self.assertLess(result['rak'], decimal.Decimal(0.7))
I'm almost certain you manually injected such "bugs",.... It's alright, but did you have a counter on how many such bugs exist?
As you requested, I'll list part below:
templates/moviegeek/index.html & templates/moviegeek/detail.html
a.setAttribute('onclick', "add_impression({{user_id}}, 'more_details', " +
movie_id + ", '{{ session_id }}','{{ csrf_token }}')")
This would collect incorrect movie id (stripped out of prev 0s), however would not be a bug if you turn column movie_id to type int, then it was yet another bug
moviegeeks/views.py
def genre(request, genre_id):
if genre_id:
selected = Genre.objects.filter(name=genre_id)[0]
movies = selected.movies.all().order_by('-year')
else:
movies = Movie.objects.all().order_by('-year')
This would cause duplicated items browsing specific genre when the data set have some null in year column. Yes some null in populated dataset.(and some 0 star explicit scoring ......????)
3. The populated data set have a lot of incorrect movie ids.
My team member is fixing on this...
I'm sure this can only be manually injected.
4. moviegeeks/views.py line 119
context_dict = {
'genres': genres,
'movies': mov.values(),
'api_key': api_key,
}
The search page cannot display the recommend items correctly because of lacking some data, (userid)
5. builder/matrix_factorization_calculator.py
def predict(self, user, item):
pq = np.dot(self.item_factors[item], self.user_factors[user].T)
b_ui = self.all_movies_mean + self.user_bias[user] + self.item_bias[item]
prediction = b_ui + pq
if prediction > 10:
prediction = 10
elif prediction < 1:
prediction = 10
return prediction
Hmmm, what kind of bug is it..... a loop in prediction???
This is only part of bugs detected till now. I don't want to make this post toooooo long.
Thank you for looking at the code. Im sure the code is not perfect, so it is great that you are using the code and helping to find bug. I would recommend making a branch of the repo and commit your fixes to the code. Which can then be merged with a PR.
Having your fixes in a branch would also make it easier to understand where you believe the bugs are. I'm not sure I understand all of your message above.
Well, if you really want this , I'll make a full list of known bugs fixed... I originally thought this is a challenge to green hands.
As agreed I'll put the rest of the list of bugs...
cur.execute('truncate table similarity')
This would cause sequence number out of range after rounds of model building, use "restart identity"
import tqdm
where in line 94
for d in tqdm(data):
In fact it was using tqdm.tqdm to perform actions.
index = similarities.MatrixSimilarity(corpus)
self.save_lda_model(lda_model, corpus, dictionary, index)
self.save_similarities(index, docs)
This means the lda model is not really in use.... I don't know it was a bug or not... calculate using the word vector is intended? Well, at least the coo_matrix conversion is not recommended because it would use up to 64 GB ram. You may always fetch a row at a time. And to the result generation: the result table is much too large, it could reach up to 130GB when used lda calculation... cut it or ... wait for 6-12 hours.
You do realize its code used for as demo code for a book right?
You do realize its code used for as demo code for a book right?
Sorry sir but I'm very worried, is it true that I need 130GB to save IDA results ?
Sorry sir but I'm very worried, is it true that I need 130GB to save IDA results? No, you should not.
Oh, yeah, our team worked on this branch for weeks to make it a working demo. And we discovered a lot of bugs, especially this one from [evaluation_runner_test.py]:
# figure out what to do with result ;) self.assertLess(result['mae'], decimal.Decimal(1.7)) self.assertLess(result['pak'], decimal.Decimal(0.7)) self.assertLess(result['rak'], decimal.Decimal(0.7))
I'm almost certain you manually injected such "bugs",....
It's a test case to test everything is working. So I create a small dataset run it through the system and verify that the results look good.
It is not unusual to have bugs. but of these types ... ?? btw thanx for the code.
I'm not sure this issue is productive. Should it be closed?
No need for exactly the locations, I want numbers for each directory.