meilisearch / scrapix

MIT License
23 stars 9 forks source link

Get scraping config on the fly #2

Closed qdequele closed 1 year ago

qdequele commented 1 year ago

Allow the user to send its scraping config, what are the tags to scrap is the main content class.

The POST body will look like this:

{
    "urls": ["https://www.google.com"],
    "meilisearch_host": "http://localhost:7700",
    "meilisearch_api_key": "masterKey",
    "meilisearch_index_name": "google",
    "scraping_config": {
        "h1": ".main-content h1",
        "h2": ".main-content h2",
        "h3": ".main-content h3",
        "h4": ".main-content h4",
        "h5": ".main-content h5",
        "h6": ".main-content h6",
        "h7": ".main-content p, .main-content li, .main-content span",
    }
}
bidoubiwa commented 1 year ago

Can I close this?

qdequele commented 1 year ago

Yes