Open silviasanasi opened 5 years ago
I think I located the problem ... The unit keeps adding -or{value} to the URL and this may have worked, now if the link is invalid TripAdvisor redirects to the main page again.
The solution is to break the while loop when that the URL value exceeds the number of reviews The offset should also be change to 10, as that seems to be the new option.
offset += 10
if offset > num_reviews:
break
I had the issue with the program endlessly looping because of this line lda_model = gensim.models.LdaMulticore(bow_corpus, num_topics=10, id2word=dictionary, passes=2, workers=3)
I found using the non-multi core version worked fine lda_model = gensim.models.ldamodel.LdaModel(bow_corpus, num_topics=10, id2word=dictionary, passes=2)
Hello everyone, After solving some issues with the code I successfully managed to extract a few csv files of reviews. Sometimes, however, the extraction starts looping on the 5 latest reviews and never stops, hence never extracting the csv file. Do you have any idea how to help me stop as soon as the review ids are duplicated? Or even just to produce a .csv file without having to wait for it to finish? I'm attaching my version of Susanli's code.