typesense / typesense-docsearch-scraper

A fork of Algolia's awesome DocSearch Scraper, customized to index data in Typesense (an open source alternative to Algolia)
https://typesense.org/docs/guide/docsearch.html
Other
95 stars 35 forks source link

Can't get docsearch scraper to run on Windows #17

Closed seowzhenjun0126 closed 1 year ago

seowzhenjun0126 commented 1 year ago

Description

Hi, I am trying to run the scrapper as described here, but keep running into the same error. Does anyone know how to resolve this issue?

Steps to reproduce

run docker run -it --env-file=c:\tmp\typesense-docsearch.env -e CONFIG=$(cat c:\tmp\typesense-docsearch-config.json | jq -r tostring) typesense/docsearch-scraper

Expected Behavior

Expect scrapper to run normally

Actual Behavior

I got this error message. It seems like CONFIG is not double quoted after being parsed by jq

Traceback (most recent call last): File "/root/src/config/config_loader.py", line 102, in _load_config data = json.loads(config, object_pairs_hook=OrderedDict) File "/usr/lib/python3.6/json/init.py", line 367, in loads return cls(**kw).decode(s) File "/usr/lib/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/src/index.py", line 116, in run_config(environ['CONFIG']) File "/root/src/index.py", line 33, in run_config config = ConfigLoader(config) File "/root/src/config/config_loader.py", line 70, in init data = self._load_config(config) File "/root/src/config/config_loader.py", line 107, in _load_config raise ValueError('CONFIG is not a valid JSON') ValueError: CONFIG is not a valid JSON

image

Metadata

Typesense Version: 0.23.1

OS: Windows 11

jasonbosco commented 1 year ago

@seowzhenjun0126 The CONFIG value should be a fully formed JSON object. In your screenshot it's missing all the double quotes around the key names and values.

seowzhenjun0126 commented 1 year ago

The JSON object is directly output from jq, which doesn't have a double quote around the keys and values. Do you know how I can fix it through the jq commands? Thanks!

seowzhenjun0126 commented 1 year ago

After some research, I have found a solution to this issue.

In Windows, use Git Bash to run the scraper to prevent the double quotes from being stripped before passing into Docker.

Reference: https://github.com/algolia/docsearch-scraper/issues/513

jasonbosco commented 1 year ago

Thank you for documenting this!