my8100 / scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:
https://github.com/my8100/files
GNU General Public License v3.0
3.17k stars 565 forks source link

Setup default settings when we run a spider #55

Closed botzill closed 5 years ago

botzill commented 5 years ago

I was wondering how easy would be to configure default settings to show when we run a spider?

Thx.

botzill commented 5 years ago

Yes, for sure not a bug :), just asking;.

my8100 commented 5 years ago

Do you mean customizing the default value of the textarea below?

image

botzill commented 5 years ago

I mean

Screen Shot 2019-06-20 at 17 14 12

Here, set there some custom one which we can define in settings.py, like make it configurable which one to show here. It's harder to setup in that textarea but here would be nice. Plus, some dropdown as well to be possible.

Thx.

my8100 commented 5 years ago

For now, you can manually modify the code below like this. I would consider making it configurable in the configuration file. Thanks for your suggestion.

 self.kwargs.setdefault('USER_AGENT', 'Chrome')  # Chrome|iPhone|iPad|Android 
 self.kwargs.setdefault('ROBOTSTXT_OBEY', 'False') 
 self.kwargs.setdefault('COOKIES_ENABLED', 'False') 
 self.kwargs.setdefault('CONCURRENT_REQUESTS', '16') 
 self.kwargs.setdefault('DOWNLOAD_DELAY', '0') 
 _additional = "-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1" 
 self.kwargs.setdefault('additional', _additional) 

https://github.com/my8100/scrapydweb/blob/01d0a4f0060a3c0e40ced15363897855127d4221/scrapydweb/operations/schedule.py#L175-L181

botzill commented 5 years ago

I see, thx @my8100! Would be really useful to have this configurable.

my8100 commented 5 years ago

@botzill

  1. pip install -U git+https://github.com/my8100/scrapydweb.git
  2. Update the existing config file with the options below: https://github.com/my8100/scrapydweb/blob/30b39b74dd70c9f7666a8ccfdee37b18f0e62465/scrapydweb/default_settings.py#L121-L151
botzill commented 5 years ago

Thx @my8100, very useful!

botzill commented 5 years ago

So, if I want to define a custom one can I do SCHEDULE_MY_CUSTOM_ONE? Or add in SCHEDULE_ADDITIONAL ?

my8100 commented 5 years ago

Customize the SCHEDULE_ADDITIONAL option.

botzill commented 5 years ago

OK, thx.

botzill commented 5 years ago

Do you think is possible to add smth like SCHEDULE_MY_CUSTOM_ONE but the same we have SCHEDULE_CONCURRENT_REQUESTS and other?

my8100 commented 5 years ago

Which setting options do you use frequently?

botzill commented 5 years ago

I have some custom ones defined, settings that are not part of scrapy.

Thx.

On Sun, 23 Jun 2019 at 11:38, LxL notifications@github.com wrote:

Which setting options do you use frequently?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/my8100/scrapydweb/issues/55?email_source=notifications&email_token=ABDYBWFCJKR2BMOVTSB2IRLP34Y6ZA5CNFSM4HZTKZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYKZQVY#issuecomment-504731735, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDYBWBXTDQ4AP6NLJCRSKTP34Y6ZANCNFSM4HZTKZPQ .

-- Chirica Gheorghe, Co-Founder - https://www.crawless.com

my8100 commented 5 years ago

I think the SCHEDULE_ADDITIONAL option is good enough for most cases.

botzill commented 5 years ago

Yes sure it is, I thought we could add inputs like that for easier setup for nontechnical users.

On Sun, 23 Jun 2019 at 12:41, LxL notifications@github.com wrote:

I think the SCHEDULE_ADDITIONAL option is good enough for most cases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/my8100/scrapydweb/issues/55?email_source=notifications&email_token=ABDYBWEXH56CSJISQBTCI2TP35ANRA5CNFSM4HZTKZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYK2PPA#issuecomment-504735676, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDYBWBZ5K4QGMBEE55YVGTP35ANRANCNFSM4HZTKZPQ .

-- Chirica Gheorghe, Co-Founder - https://www.crawless.com