Closed galligan closed 2 years ago
Thanks for putting this together! Hoping it's going to work for our use case.
I tossed the config that you've got as an example directly into algoria.json and the Action spit out this :
2022-02-24T18:51:29.1064679Z ##[group]Run signcl/docsearch-scraper-action@master 2022-02-24T18:51:29.1064976Z env: 2022-02-24T18:51:29.1065591Z APPLICATION_ID: *** 2022-02-24T18:51:29.1065858Z API_KEY: *** 2022-02-24T18:51:29.1066830Z CONFIG: {"index_name":"xmtp_docs","start_urls":["https://docs.xmtp.org/","https://mg0716-docs-updates.xmtp-docs-test.pages.dev/"],"sitemap_urls":["https://docs.xmtp.org/sitemap.xml","https://mg0716-docs-updates.xmtp-docs-test.pages.dev/sitemap.xml"],"sitemap_alternate_links":true,"stop_urls":[],"selectors":{"lvl1":"header h1","lvl2":"article h2","lvl3":"article h3","lvl4":"article h4","lvl5":"article h5, article td:first-child","lvl6":"article h6","text":"article p, article li, article td:last-child"},"strip_chars":" .,;:#","custom_settings":{"separatorsToIndex":"_","attributesForFaceting":["language","version","type","docusaurus_tag"],"attributesToRetrieve":["hierarchy","content","anchor","url","url_without_anchor","type"]}} 2022-02-24T18:51:29.1067792Z ##[endgroup] 2022-02-24T18:51:29.1292370Z ##[command]/usr/bin/docker run --name db2d71370be3b957d46a3bae3ffc9bfb22e1e_6d377d --label 7db2d7 --workdir /github/workspace --rm -e APPLICATION_ID -e API_KEY -e CONFIG -e HOME -e GITHUB_JOB -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_REPOSITORY_OWNER -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RETENTION_DAYS -e GITHUB_RUN_ATTEMPT -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_SERVER_URL -e GITHUB_API_URL -e GITHUB_GRAPHQL_URL -e GITHUB_REF_NAME -e GITHUB_REF_PROTECTED -e GITHUB_REF_TYPE -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e GITHUB_ACTION_REPOSITORY -e GITHUB_ACTION_REF -e GITHUB_PATH -e GITHUB_ENV -e RUNNER_OS -e RUNNER_ARCH -e RUNNER_NAME -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e ACTIONS_CACHE_URL -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/docs/docs":"/github/workspace" 7db2d7:1370be3b957d46a3bae3ffc9bfb22e1e 2022-02-24T18:51:30.5196856Z Traceback (most recent call last): 2022-02-24T18:51:30.5197212Z File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main 2022-02-24T18:51:30.5197465Z "__main__", mod_spec) 2022-02-24T18:51:30.5197708Z File "/usr/lib/python3.6/runpy.py", line 85, in _run_code 2022-02-24T18:51:30.5197935Z exec(code, run_globals) 2022-02-24T18:51:30.5198201Z File "/root/src/index.py", line 119, in <module> 2022-02-24T18:51:30.5198730Z run_config(environ['CONFIG']) 2022-02-24T18:51:30.5198972Z File "/root/src/index.py", line 33, in run_config 2022-02-24T18:51:30.5199204Z config = ConfigLoader(config) 2022-02-24T18:51:30.5199460Z File "/root/src/config/config_loader.py", line 84, in __init__ 2022-02-24T18:51:30.5199708Z self._parse() 2022-02-24T18:51:30.5199938Z File "/root/src/config/config_loader.py", line 120, in _parse 2022-02-24T18:51:30.5200237Z self.selectors = SelectorsParser().parse(self.selectors) 2022-02-24T18:51:30.5200562Z File "/root/src/config/selectors_parser.py", line 69, in parse 2022-02-24T18:51:30.5200842Z config_selectors[selectors_key]) 2022-02-24T18:51:30.5201107Z File "/root/src/config/selectors_parser.py", line 10, in _parse_selectors_set 2022-02-24T18:51:30.5201393Z selectors_set[key] = config_selectors[key] 2022-02-24T18:51:30.5201640Z TypeError: string indices must be integers 2022-02-24T18:51:30.7383868Z Cleaning up orphan processes
Here's the config file:
{ "index_name": "docs", "start_urls": ["https://example.com/"], "sitemap_urls": ["https://example.com/sitemap.xml"], "sitemap_alternate_links": true, "stop_urls": [], "selectors": { "lvl1": "header h1", "lvl2": "article h2", "lvl3": "article h3", "lvl4": "article h4", "lvl5": "article h5, article td:first-child", "lvl6": "article h6", "text": "article p, article li, article td:last-child" }, "strip_chars": " .,;:#", "custom_settings": { "separatorsToIndex": "_", "attributesForFaceting": ["language", "version", "type", "docusaurus_tag"], "attributesToRetrieve": [ "hierarchy", "content", "anchor", "url", "url_without_anchor", "type" ] } }
Any thoughts?
@galligan I met the same problem. Would you share how you solved the problem?
Thanks for putting this together! Hoping it's going to work for our use case.
I tossed the config that you've got as an example directly into algoria.json and the Action spit out this :
Here's the config file:
Any thoughts?