howie6879 / ruia

Async Python 3.6+ web scraping micro-framework based on asyncio
https://www.howie6879.com/ruia/
Apache License 2.0
1.75k stars 181 forks source link

`DELAY` attribute specifically for retries #80

Closed abmyii closed 4 years ago

abmyii commented 4 years ago

I assumed the DELAY attr would set the delay for retries but instead it applies to all requests. I would appreciate it if there was a DELAY attr specifically for retries (RETRY_DELAY). I'd be happy to implement it if given the go-ahead.

Thank you for this great library!

howie6879 commented 4 years ago

Hi @abmyii : Did you mean setting a specific delay configuration during the retry process?

abmyii commented 4 years ago

Yes - but one that applies only for retries - not for all requests.

howie6879 commented 4 years ago

I don’t quite understand, it possible to set retry every time when you use request?

abmyii notifications@github.com于2020年1月1日 周三下午5:14写道:

Yes - but one that applies only for retries - not for all requests.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/howie6879/ruia/issues/80?email_source=notifications&email_token=AECB6XC3IYCQAY7EBIU5NJLQ3RNIPA5CNFSM4KBYLRX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH5BCLA#issuecomment-570036524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECB6XHXCJSG6B3GZAM2BCDQ3RNIPANCNFSM4KBYLRXQ .

abmyii commented 4 years ago

How would I go about doing that?

howie6879 commented 4 years ago

https://github.com/howie6879/ruia/blob/master/examples/simple_spider/douban_spider.py

We assume a parameter named request_config in this spider demo.

This setting can take effect in every request, Is this what you need?

abmyii notifications@github.com于2020年1月1日 周三下午5:39写道:

How would I go about doing that?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/howie6879/ruia/issues/80?email_source=notifications&email_token=AECB6XC3WWKKQSX5D7EALQ3Q3RQERA5CNFSM4KBYLRX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH5BNNY#issuecomment-570037943, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECB6XHLG7R2EHRL6GWWJXTQ3RQERANCNFSM4KBYLRXQ .

abmyii commented 4 years ago

Here is a rough explanation:

request 1 (fails) -> wait for <x> seconds then retry -> request 2 (fails) -> repeat until request passes

But only for that request - the rest of the concurrent requests should continue like normal - with no timeout.

howie6879 commented 4 years ago

For any request, the first step is to delay first. If it fails, then try again, and concurrent requests will also be delayed first.

abmyii notifications@github.com于2020年1月1日 周三下午5:51写道:

Here is a rough explanation:

request 1 (fails) -> wait for seconds then retry -> request 2 (fails) -> repeat until request passes

But only for that request - the rest of the concurrent requests should continue like normal - with no timeout.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/howie6879/ruia/issues/80?email_source=notifications&email_token=AECB6XCUCBEN7I6CZM2ETVLQ3RRSBA5CNFSM4KBYLRX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH5BSWY#issuecomment-570038619, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECB6XEPH5SO3DFOOMNEMFTQ3RRSBANCNFSM4KBYLRXQ .

abmyii commented 4 years ago

Yes. Can this be changed so that there is a delay only for retries (gives the server a chance to serve the page requested - i.e. if not cached already) so that the requests that get an immediate response aren't slowed down?

I might make a quick implementation to show what I mean. Will that help?

howie6879 commented 4 years ago

Of course, it would be nice if there is code combined with the description, thank you

abmyii notifications@github.com于2020年1月1日 周三下午6:17写道:

Yes. Can this be changed so that there is a delay only for retries (gives the server a chance to serve the page requested - i.e. if not cached already) so that the requests that get an immediate response aren't slowed down?

I might make a quick implementation to show what I mean. Will that help?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/howie6879/ruia/issues/80?email_source=notifications&email_token=AECB6XABJANZCZWF3IAZ5L3Q3RUR3A5CNFSM4KBYLRX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH5B6MI#issuecomment-570040113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECB6XERAESY2BU24Q4TIOTQ3RUR3ANCNFSM4KBYLRXQ .

abmyii commented 4 years ago

Done: https://github.com/howie6879/ruia/compare/master...abmyii:retry_delay

howie6879 commented 4 years ago

You want to set a special parameter for all retry requests to sleep

If you think this demand is very strong, I think it can be added, can you submit a pull request?

abmyii notifications@github.com于2020年1月1日 周三下午6:23写道:

Done: master...abmyii:retry_delay https://github.com/howie6879/ruia/compare/master...abmyii:retry_delay

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/howie6879/ruia/issues/80?email_source=notifications&email_token=AECB6XDUCOXKGTXJJC745LDQ3RVIHA5CNFSM4KBYLRX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH5CBMQ#issuecomment-570040498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECB6XDX4LICSHPIHH446GTQ3RVIHANCNFSM4KBYLRXQ .

abmyii commented 4 years ago

Sure. I think the delay should be inside if self.retry_times > 0:, correct? Also, thank you very much for taking the time to discuss this!

howie6879 commented 4 years ago

Of course, retry only takes effect if a retry occurs,we both gave our time to this issue, and I want to thank you, too, for trying to make Ruia better

abmyii notifications@github.com于2020年1月1日 周三下午6:32写道:

Sure. I think the delay should be inside if self.retry_times > 0:, correct? Also, thank you very much for taking the time to discuss this!

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/howie6879/ruia/issues/80?email_source=notifications&email_token=AECB6XCG3A6NW7SXX6KGHNTQ3RWMHA5CNFSM4KBYLRX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH5CGLI#issuecomment-570041133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECB6XESRDAHNCLUA5WVUJ3Q3RWMHANCNFSM4KBYLRXQ .