Closed anttiope closed 1 month ago
The random sleep was probably to lessen the bot-likeness of the crawler, with the intent that over time, the mean sleep would end up being 1 second (or whatever is specified in the arguments).
Not sure the anti-bot-likeness is even needed, but anyway it is clear that at least some modification of the code is warranted. Pull requests welcome.
Thanks for the fix!
Title: Issue with
randrange
in delay configurationDescription: The README states the intention to be a good netizen by defaulting to a one-second delay between each web request to media websites to avoid undue load on their servers. This delay is configurable using command line parameters.
By default, the delay is set at 1.0 seconds (float). However,
randrange
fromrandom
takes integers as arguments, which gives an error. This occurs, for instance, with the filequery_yle.py
on line 59:sleep(random.randrange(args.delay*2))
.The code can be made to work by adding
sleep(random.randrange(int(args.delay*2)))
, but with the default of 1.0, this results in a random integer generation with a range from 0 to 1, so the default value of 1.0 results in a 1-second delay only 50% of the time.Steps to Reproduce:
query_yle.py
.randrange
.Expected Behavior: The script should introduce a delay without errors.
Actual Behavior: The script throws an error due to
randrange
requiring integer arguments.Proposed Solution: If randomness is desired, perhaps something like
sleep(random.uniform(0, args.delay*2))
could be used, which would result in a floating point delay range between 0.0 and 2.0 (if the delay is set at 1.0, the default).Affected Files:
fetch_hs.py
fetch_open.py
query_hs.py
query_il.py
query_is.py
query_yle.py
Environment: