meilisearch / scrapix

MIT License
23 stars 9 forks source link

url_to_index does not work #49

Open bidoubiwa opened 1 year ago

bidoubiwa commented 1 year ago

When adding urls_to_index to my configuration file, no pages are indexed at all

qdequele commented 1 year ago

I can't reproduce.

When I try this config the following config, I have 950 docs:

{
    "start_urls": [
        "https://meilisearch.com/docs",
        "https://www.meilisearch.com/docs"
    ],
    "meilisearch_url": "{{meilisearch_host}}",
    "meilisearch_api_key": "{{meilisearch_api_key}}",
    "meilisearch_index_uid": "{{meilisearch_index_name}}"
}

But when I try this config the following config, I have 468 docs:

{
    "start_urls": [
        "https://meilisearch.com/docs",
        "https://www.meilisearch.com/docs"
    ],
    "urls_to_index": ["https://www.meilisearch.com/docs/learn"],
    "meilisearch_url": "{{meilisearch_host}}",
    "meilisearch_api_key": "{{meilisearch_api_key}}",
    "meilisearch_index_uid": "{{meilisearch_index_name}}"
}