mamei16 / LLM_Web_search

An extension for oobabooga/text-generation-webui that enables the LLM to search the web using DuckDuckGo
GNU Affero General Public License v3.0
146 stars 23 forks source link

Bump unstructured from 0.6.6 to 0.11.8 #22

Closed dependabot[bot] closed 3 months ago

dependabot[bot] commented 3 months ago

Bumps unstructured from 0.6.6 to 0.11.8.

Release notes

Sourced from unstructured's releases.

0.11.8

Enhancements

  • Add SaaS API User Guide. This documentation serves as a guide for Unstructured SaaS API users to register, receive an API key and URL, and manage your account and billing information.

0.11.7

Enhancements

  • Add intra-chunk overlap capability. Implement overlap for split-chunks where text-splitting is used to divide an oversized chunk into two or more chunks that fit in the chunking window. Note this capability is not yet available from the API but will shortly be made accessible using a new overlap kwarg on partition functions.
  • Update encoders to leverage dataclasses All encoders now follow a class approach which get annotated with the dataclass decorator. Similar to the connectors, it uses a nested dataclass for the configs required to configure a client as well as a field/property approach to cache the client. This makes sure any variable associated with the class exists as a dataclass field.

Features

  • Add Qdrant destination connector. Adds support for writing documents and embeddings into a Qdrant collection.
  • Store base64 encoded image data in metadata fields. Rather than saving to file, stores base64 encoded data of the image bytes and the mimetype for the image in metadata fields: image_base64 and image_mime_type (if that is what the user specifies by some other param like pdf_extract_to_payload). This would allow the API to have parity with the library.

Fixes

  • Fix table structure metric script Update the call to table agent to now provide OCR tokens as required
  • Fix element extraction not working when using "auto" strategy for pdf and image If element extraction is specified, the "auto" strategy falls back to the "hi_res" strategy.
  • Fix a bug passing a custom url to partition_via_api Users that self host the api were not able to pass their custom url to partition_via_api.

0.11.6

Enhancements

  • Update the layout analysis script. The previous script only supported annotating final elements. The updated script also supports annotating inferred and extracted elements.
  • AWS Marketplace API documentation: Added the user guide, including setting up VPC and CloudFormation, to deploy Unstructured API on AWS platform.
  • Azure Marketplace API documentation: Improved the user guide to deploy Azure Marketplace API by adding references to Azure documentation.
  • Integration documentation: Updated URLs for the staging_for bricks

Features

  • Partition emails with base64-encoded text. Automatically handles and decodes base64 encoded text in emails with content type text/plain and text/html.
  • Add Chroma destination connector Chroma database connector added to ingest CLI. Users may now use unstructured-ingest to write partitioned/embedded data to a Chroma vector database.
  • Add Elasticsearch destination connector. Problem: After ingesting data from a source, users might want to move their data into a destination. Elasticsearch is a popular storage solution for various functionality such as search, or providing intermediary caches within data pipelines. Feature: Added Elasticsearch destination connector to be able to ingest documents from any supported source, embed them and write the embeddings / documents into Elasticsearch.

Fixes

  • Enable --fields argument omission for elasticsearch connector Solves two bugs where removing the optional parameter --fields broke the connector due to an integer processing error and using an elasticsearch config for a destination connector resulted in a serialization issue when optional parameter --fields was not provided.

0.11.5

Enhancements

Features

Fixes

  • Fix partition_pdf() and partition_image() importation issue. Reorganize pdf.py and image.py modules to be consistent with other types of document import code.

... (truncated)

Changelog

Sourced from unstructured's changelog.

0.11.8

Enhancements

  • Add SaaS API User Guide. This documentation serves as a guide for Unstructured SaaS API users to register, receive an API key and URL, and manage your account and billing information.
  • Add inter-chunk overlap capability. Implement overlap between chunks. This applies to all chunks prior to any text-splitting of oversized chunks so is a distinct behavior; overlap at text-splits of oversized chunks is independent of inter-chunk overlap (distinct chunk boundaries) and can be requested separately. Note this capability is not yet available from the API but will shortly be made accessible using a new overlap_all kwarg on partition functions.

Features

Fixes

0.11.7

Enhancements

  • Add intra-chunk overlap capability. Implement overlap for split-chunks where text-splitting is used to divide an oversized chunk into two or more chunks that fit in the chunking window. Note this capability is not yet available from the API but will shortly be made accessible using a new overlap kwarg on partition functions.
  • Update encoders to leverage dataclasses All encoders now follow a class approach which get annotated with the dataclass decorator. Similar to the connectors, it uses a nested dataclass for the configs required to configure a client as well as a field/property approach to cache the client. This makes sure any variable associated with the class exists as a dataclass field.

Features

  • Add Qdrant destination connector. Adds support for writing documents and embeddings into a Qdrant collection.
  • Store base64 encoded image data in metadata fields. Rather than saving to file, stores base64 encoded data of the image bytes and the mimetype for the image in metadata fields: image_base64 and image_mime_type (if that is what the user specifies by some other param like pdf_extract_to_payload). This would allow the API to have parity with the library.

Fixes

  • Fix table structure metric script Update the call to table agent to now provide OCR tokens as required
  • Fix element extraction not working when using "auto" strategy for pdf and image If element extraction is specified, the "auto" strategy falls back to the "hi_res" strategy.
  • Fix a bug passing a custom url to partition_via_api Users that self host the api were not able to pass their custom url to partition_via_api.

0.11.6

Enhancements

  • Update the layout analysis script. The previous script only supported annotating final elements. The updated script also supports annotating inferred and extracted elements.
  • AWS Marketplace API documentation: Added the user guide, including setting up VPC and CloudFormation, to deploy Unstructured API on AWS platform.
  • Azure Marketplace API documentation: Improved the user guide to deploy Azure Marketplace API by adding references to Azure documentation.
  • Integration documentation: Updated URLs for the staging_for bricks

Features

  • Partition emails with base64-encoded text. Automatically handles and decodes base64 encoded text in emails with content type text/plain and text/html.
  • Add Chroma destination connector Chroma database connector added to ingest CLI. Users may now use unstructured-ingest to write partitioned/embedded data to a Chroma vector database.
  • Add Elasticsearch destination connector. Problem: After ingesting data from a source, users might want to move their data into a destination. Elasticsearch is a popular storage solution for various functionality such as search, or providing intermediary caches within data pipelines. Feature: Added Elasticsearch destination connector to be able to ingest documents from any supported source, embed them and write the embeddings / documents into Elasticsearch.

Fixes

  • Enable --fields argument omission for elasticsearch connector Solves two bugs where removing the optional parameter --fields broke the connector due to an integer processing error and using an elasticsearch config for a destination connector resulted in a serialization issue when optional parameter --fields was not provided.
  • Add hi_res_model_name Adds kwarg to relevant functions and add comments that model_name is to be deprecated.

0.11.5

... (truncated)

Commits
  • 8e2bfca Unstructured SaaS API subscription guide (#2341)
  • 91b892c fix: Fix api_url param to partition_via_api (#2342)
  • 1b70ea8 fix: update table structure eval to use new table inference interface (#2306)
  • dd1443a feat: add Qdrant ingest destination connector (#2338)
  • 9459af4 Fix: element extraction not working when using "auto" strategy for pdf (#2324)
  • dd14445 Feat: return base64 encoded images for PDF's (#2310)
  • 8ba9fad feat: improve dataclass use for encoders (#2318)
  • bfef183 feat: update encoders to be dataclasses (#2313)
  • eb1b022 feat(chunking): add overlap on chunk-splits (#2305)
  • 5c0043a chore: add hi_res_model_name kwarg (#2289)
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)