Closed harshaZakapps closed 5 years ago
I too have same issue. Can anyone reply for this???
Try scrapydweb.
i am also struck with it.
@harshaZakapps @Deva888 @Vel23 @Venkatdeva1117 Run the command below and post the output.
python -c "import platform; import sys; import scrapyd; import twisted; print(platform.platform()); print(sys.version_info);print(scrapyd.__version__); print(twisted.__version__)"
python -c "import platform; import sys; import scrapyd; import twisted; print(platform.platform()); print(sys.version_info);print(scrapyd.__version__); print(twisted.__version__)"
Windows-10-10.0.17134-SP0 sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0) 1.2.0 18.9.0
@harshaZakapps Is it the first time you use scrapyd-deploy? Have you tried scrapydweb?
@harshaZakapps Is it the first time you use scrapyd-deploy? Have you tried scrapydweb?
yes im using it for the first time, btw what is scrapydweb? im Using scrapyd so that i can use celery or django rest to call the spider.so i thought its better to use scrapyd than scrapydweb.
@harshaZakapps Can you visit http://localhost:6800/addversion.json
@harshaZakapps Can you visit http://localhost:6800/addversion.json
NO, scrapyd-deploy Packing version 1559638934 Deploying to project "PCAP" in http://localhost:6800/addversion.json Deploy failed: <urlopen error [WinError 10061] No connection could be made because the target machine actively refused it>
You should run scrapyd first...
yes after running scrapyd, i can view http://localhost:6800/addversion.json, {"node_name": "ZA-LTP48", "status": "error", "message": "Expected one of [b'HEAD', b'object', b'POST']"}
Then scrapyd-deploy should work now. BTW, scrapydweb is a web app for Scrapyd cluster management. (demo)
Then scrapyd-deploy should work now. BTW, scrapydweb is a web app for Scrapyd cluster management. (demo)
thanks, its working now,is there anyway to call scrapyd curl using REST API
(py_1) E:\scrapyrfq>scrapyd-deploy monamii_rfq -p scrapyrfq Packing version 1584928574 Deploying to project "scrapyrfq" in http://localhost:6800/addversion.json I do not know why it is not response for long another word is over
Im trying to eggify my spider project using scrapyd-deploy default command but it says target machine has refused it. i have tried disabling the firewall but still says the same Error.
scrapy.cfg
[settings] default = PCAP.settings
[deploy] url = http://localhost:6800/ project = PCAP
settings.py
-- coding: utf-8 --
Scrapy settings for PCAP projector
#
For simplicity, this file contains only settings considered important
commonly used. You can find more settings consulting the documentation:
#
https://doc.scrapy.org/en/latest/topics/settings.html
https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
https://doc.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'PCAP'
SPIDER_MODULES = ['PCAP.spiders'] NEWSPIDER_MODULE = 'PCAP.spiders'
Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'PCAP (+http://www.yourdomain.com)'
Obey robots.txt rules
ROBOTSTXT_OBEY = False
Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS = 16
Configure a delay for requests for the same website (default: 0)
See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
See also autothrottle settings and docs
DOWNLOAD_DELAY = 3
The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = 32
CONCURRENT_REQUESTS_PER_IP = 1
Disable cookies (enabled by default)
COOKIES_ENABLED = False
Disable Telnet Console (enabled by default)
TELNETCONSOLE_ENABLED = False
Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',
'Accept-Language': 'en',
}
Enable or disable spider middlewares
See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
SPIDER_MIDDLEWARES = {
'scrapy_proxy_pool.middlewares.ProxyPoolMiddleware': 610,
'scrapy_proxy_pool.middlewares.BanDetectionMiddleware': 620,
}
DOWNLOADER_MIDDLEWARES = { 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, 'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
'scrapy_proxy_pool.middlewares.ProxyPoolMiddleware': 610,
}
Enable or disable downloader middlewares
See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
'PCAP.middlewares.PcapDownloaderMiddleware': 543,
}
Enable or disable extensions
See https://doc.scrapy.org/en/latest/topics/extensions.html
EXTENSIONS = {
'scrapy.extensions.telnet.TelnetConsole': None,
}
Configure item pipelines
See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = { 'PCAP.pipelines.PcapPipeline': 300, }
Enable and configure the AutoThrottle extension (disabled by default)
See https://doc.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
The initial download delay
AUTOTHROTTLE_START_DELAY = 5
The maximum download delay to be set in case of high latencies
AUTOTHROTTLE_MAX_DELAY = 60
The average number of requests Scrapy should be sending in parallel to
each remote server
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
Enable showing throttling stats for every response received:
AUTOTHROTTLE_DEBUG = False
Enable and configure HTTP caching (disabled by default)
See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 0
HTTPCACHE_DIR = 'httpcache'
HTTPCACHE_IGNORE_HTTP_CODES = []
HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
FAKEUSERAGENT_FALLBACK = 'Mozilla'
PROXY_POOL_ENABLED = True
DB_CONNECT = { 'db': 'pcap01', # Your db 'user': 'root', # 'passwd': 'root', # 'host': 'localhost', # Your Server 'charset': 'utf8', 'use_unicode': True, }