Open ngthanhtrung23 opened 3 years ago
Hey @ngthanhtrung23! Thanks for bringing this up. I was partly aware of the possibility of this situation arising though thought that cases like these would be fairly uncommon. Well, turns out I was wrong.
I would agree that this is an inefficiency in Harwest though is something that can be addressed manually by starting Harwest from the next page by using the --start-page
configuration. This approach sure won't scale well if it happens rather often over a submission space of 150+ pages.
Fixing it would require a bit of an effort since the entire flow of the tool would have to be modified. As for the moment, maybe we can take up the approach recommended by @Mohammad-Yasser on https://codeforces.com/blog/entry/85788?#comment-735930 as a temporary solution?
Yeah I was able to make it work for me by commenting out some code in workflow.py
:)
if not len(response) or not any(response):
break
I created this issue just to bring it to your attention as some other users may face this.
Way to go @ngthanhtrung23! You sure amaze me with how quick and easy it is for you to hack on any code. I'll indeed keep this issue open and keep an eye on it. If a lot of people complain about this then will for sure fix it at once. I have to admit I'm a bit lazy :D
@nileshsah I would suggest increasing the page size to 1000 or some huge number,
I was partly aware of the possibility of this situation arising though thought that cases like these would be fairly uncommon.
i think with such a huge number it would be very unlikely to occur? unless someone did 1000+ gym submission, also that would reduce the number of api calls as well
Great thinking there @s-i-d-d-i-s! It does seem like a possible idea that we can use. I remember the reason why I first went with the pagination approach of 50 was to keep it in parity with the submissions page on codeforces for easy tracking, though it might not be completely necessary. Let's take up your approach as a first iteration for dealing with this problem if more people request this feature. Hopefully should not hurt the user experience much.
How to reproduce:
harwest codeforces -p 5
What happens: the crawler stop without crawling anything, even though I have 150+ pages of submissions.
I think the reason is because page 5 has only my non-AC or gym submissions. So
self.client.get_user_submissions
returns an empty array, thus stopping the crawler.