svoi-fr / mirai

Refugee assistant bot
https://docs.danswer.dev/
MIT License
2 stars 1 forks source link

Scraper Settings and UI/UX Enhancements #45

Open ptitzlabs opened 1 month ago

ptitzlabs commented 1 month ago

Scraper Settings and UI/UX Enhancements

Overview

The current scraper only allows scraping websites on a per-site basis, with no settings or options for customization. The scraper connector display page also lacks a searchable document list and the ability to apply tags per-document. To address these limitations, this issue proposes the addition of scraper settings and UI/UX enhancements.

Deliverables

The following deliverables are expected for this issue:

  1. Scraper Settings: A set of configurable options for the scraper, including:
    • URL exclude filters: A list of URL patterns to exclude from scraping.
    • Default tags: A list of tags to be automatically applied to all documents parsed from a given website.
  2. Scraper Connector Display Page Enhancements: UI/UX improvements to the scraper connector display page, including:
    • A searchable document list that can display documents as they are stored in the database.
    • The ability to apply tags to individual documents in the list.
  3. Frontend Implementation: The necessary frontend changes to implement the proposed scraper settings and display page enhancements.
  4. Backend Implementation: The necessary backend changes to support the proposed scraper settings and display page enhancements, including any changes to the database schema or API endpoints.

Detailed Description

Scraper Settings

The proposed scraper settings would allow for more fine-grained control over the scraping process. The URL exclude filters would enable users to exclude certain sections of a website or specific types of content from scraping. The default tags would allow users to automatically categorize and organize documents based on the website they were scraped from.

Scraper Connector Display Page Enhancements

The proposed enhancements to the scraper connector display page would improve usability and functionality. The searchable document list would make it easier for users to find and view the documents that have been scraped. The ability to apply tags to individual documents would allow for more flexible and nuanced document categorization.

Frontend Implementation

The frontend implementation of the proposed changes would involve modifying the existing scraper and display page components to incorporate the new settings and features. This would likely involve changes to the component logic, UI, and state management.

Backend Implementation

The backend implementation of the proposed changes would involve modifying the existing scraper and display page API endpoints to support the new settings and features. This may also involve changes to the database schema to store the new settings and document tags.

Acceptance Criteria

The following acceptance criteria must be met for this issue to be considered complete:

  1. The scraper settings are configurable and functional, allowing users to exclude URLs and apply default tags to documents.
  2. The scraper connector display page includes a searchable document list and the ability to apply tags to individual documents.
  3. The frontend implementation is complete, with all proposed changes incorporated and tested.
  4. The backend implementation is complete, with all necessary changes to the API and database schema made and tested.

Resources and Tools

Reference scrapers: http://usescraper.com

Special Cases and Exceptions

None.

Expected Output

The expected output for this issue is a set of configurable scraper settings and an enhanced scraper connector display page, along with the necessary frontend and backend changes to support them. These changes should improve the functionality, usability, and flexibility of the scraper and display page components.