Closed saneef closed 3 years ago
I'm trying to crawl a local website and create index. Here is the config I'm using. But, I'm facing 'error': 'Fieldversionmust be a string.'. Am I missing anything in the config?
'error': 'Field
must be a string.'
config.json:
config.json
{ "index_name": "test-local-dev-site", "start_urls": ["http://192.168.1.100/solutions/"], "stop_urls": [], "selectors": { "lvl0": { "selector": ".page-header__nav ul li a[data-state=active]", "default_value": "Home", "global": true }, "lvl1": { "selector": "article h1", "global": true }, "lvl2": { "selector": "article h2", "global": true }, "lvl3": { "selector": "article h3", "global": true }, "lvl4": { "selector": "article h4", "global": true }, "text": "article p, article li" } }
TYPESENSE_API_KEY=the-generated-api-kay TYPESENSE_HOST=192.168.1.100 TYPESENSE_PORT=8108 TYPESENSE_PROTOCOL=http
When running scraper, I'm getting 'error': 'Fieldversionmust be a string.'. Here is longer log:
DEBUG:urllib3.connectionpool:http://192.168.1.100:8108 "POST /collections/test-local-dev-site_1626455860/documents/import HTTP/1.1" 200 None DEBUG:typesense.api_call:192.168.1.100:8108 is healthy. Status code: 200 [{'code': 400, 'document': '{"content": "Our users solve problems with yet-another-db in fields ranging from predictive risk systems in investment banks and fraud detection via graph analytics to authorization for IoT data and temporal queries across financial transactions on enterprise blockchains", "hierarchy": {"lvl0": "Solutions", "lvl1": "yet-another-db in Production", "lvl2": null, "lvl3": null, "lvl4": null, "lvl5": null, "lvl6": null}, "hierarchy_radio": {"lvl4": null, "lvl3": null, "lvl2": null, "lvl1": null, "lvl0": null}, "type": "content", "tags": [], "weight": {"page_rank": 0, "level": 0, "position": 0}, "url": "http://192.168.1.100/solutions/", "url_without_variables": "http://192.168.1.100/solutions/", "hierarchy_camel": [{"lvl0": "Solutions", "lvl1": "yet-another-db in Production", "lvl2": null, "lvl3": null, "lvl4": null, "lvl5": null, "lvl6": null}], "hierarchy_radio_camel": {"lvl4": null, "lvl3": null, "lvl2": null, "lvl1": null, "lvl0": null}, "content_camel": "Our users solve problems with yet-another-db in fields ranging from predictive risk systems in investment banks and fraud detection via graph analytics to authorization for IoT data and temporal queries across financial transactions on enterprise blockchains", "language": "en", "version": ["1.0.0", "latest"], "url_without_anchor": "http://192.168.1.100/solutions/", "no_variables": true, "objectID": "ec4386a15bc9dc3e8268e2dc90c9608e7bba6f28", "item_priority": 0, "hierarchy.lvl0": "Solutions", "hierarchy.lvl1": "yet-another-db in Production"}', 'error': 'Field `version` must be a string.', 'success': False}] ERROR:scrapy.core.scraper:Spider error processing <GET http://192.168.1.100/solutions/> (referer: None) Traceback (most recent call last): File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/root/src/documentation_spider.py", line 177, in parse_from_start_url self.add_records(response, from_sitemap=False) File "/root/src/documentation_spider.py", line 149, in add_records self.typesense_helper.add_records(records, response.url, from_sitemap) File "/root/src/typesense_helper.py", line 65, in add_records raise Exception Exception 2021-07-16 17:17:40 [scrapy.core.scraper] ERROR: Spider error processing <GET http://192.168.1.100/solutions/> (referer: None) Traceback (most recent call last): File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/root/src/documentation_spider.py", line 177, in parse_from_start_url self.add_records(response, from_sitemap=False) File "/root/src/documentation_spider.py", line 149, in add_records self.typesense_helper.add_records(records, response.url, from_sitemap) File "/root/src/typesense_helper.py", line 65, in add_records raise Exception Exception INFO:scrapy.core.engine:Closing spider (finished) INFO:scrapy.statscollectors:Dumping Scrapy stats: {'downloader/request_bytes': 216, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 2294, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.235181, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 7, 16, 17, 17, 40, 481790), 'log_count/ERROR': 1, 'memusage/max': 62435328, 'memusage/startup': 62435328, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'spider_exceptions/Exception': 1, 'start_time': datetime.datetime(2021, 7, 16, 17, 17, 40, 246609)} INFO:scrapy.core.engine:Spider closed (finished) Crawling issue: nbHits 0 for test-local-dev-site
Indexing to succeed.
Indexing fails.
Typesense Version: 0.21.0
OS: macOS 11.4
@saneef I just pushed out a fix for this, could you do docker pull typesense/docsearch-scraper and then try again?
docker pull typesense/docsearch-scraper
@jasonbosco The fix works! Thanks a lot for the quick fix.
Description
I'm trying to crawl a local website and create index. Here is the config I'm using. But, I'm facing
'error': 'Field
versionmust be a string.'
. Am I missing anything in the config?config.json
:Steps to reproduce
When running scraper, I'm getting
'error': 'Field
versionmust be a string.'
. Here is longer log:Expected Behavior
Indexing to succeed.
Actual Behavior
Indexing fails.
Metadata
Typesense Version: 0.21.0
OS: macOS 11.4