pathwaycom / llm-app

Dynamic RAG for enterprise. Ready to run with Docker,⚡in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.
https://pathway.com/developers/templates/
MIT License
3.35k stars 192 forks source link

Pathway's connector issue in Jsonlines #53

Open Boburmirzo opened 10 months ago

Boburmirzo commented 10 months ago

Flatten data structures in a Jsonline file can not be mapped to structured schemas automatically.

For example, list_price and current_price mapping to the scheme is failing:

{"position": 1, "link": "https://www.amazon.com/Avia-Resistant-Restaurant-Service-Sneakers/dp/B0BJY1FN8F", "asin": "B0BJXSKK9L", "is_lightning_deal": false, "deal_type": "BEST_DEAL", "is_prime_exclusive": false, "starts_at": "2023-08-14T07:00:08.270Z", "ends_at": "2023-08-21T06:45:08.270Z", "type": "multi_item", "title": "Avia Anchor SR Mesh Slip On Black Non Slip Shoes for Women, Comfortable Water Resistant Womens Food Service Sneakers - Black, Blue, or White Med or Wide Restaurant, Slip Resistant Work Shoes Women", "image": "https://m.media-amazon.com/images/I/3195IpEIRpL._SY500_.jpg", "deal_price": 39.98, "list_price": {"value": 59.98, "currency": "USD", "symbol": "$", "raw": "59.98", "name": "List Price"}, "current_price": {"value": 39.98, "currency": "USD", "symbol": "$", "raw": "39.98", "name": "Current Price"}, "merchant_name": "Galaxy Active", "free_shipping": false, "is_prime": true, "is_map": false, "deal_id": "34f3da97", "seller_id": "A3GMJQO0HY62S", "description": "Avia Anchor SR Mesh Slip On Black Non Slip Shoes for Women, Comfortable Water Resistant Womens Food Service Sneakers - Black, Blue, or White Med or Wide Restaurant, Slip Resistant Work Shoes Women", "rating": 4.16, "ratings_total": 1148, "old_price": 59.98, "currency": "USD"}

In this data schema:

class Price(pw.Schema):
    value: float
    currency: str
    symbol: str
    raw: str
    name: str

class DealResult(pw.Schema):
    position: int
    link: str
    asin: str
    is_lightning_deal: bool
    deal_type: str
    is_prime_exclusive: bool
    starts_at: str
    ends_at: str
    type: str
    title: str
    image: str
    deal_price: Price
    list_price: Price
    current_price: Price
    merchant_name: str
    free_shipping: bool
    is_prime: bool
    is_map: bool

The error I got:

Read data parsed unsuccessfully. field deal_price with no JsonPointer path specified is absent in

mdmalhou commented 10 months ago

@Boburmirzo I think nested json/dict is not supported yet by the connectors. Might be a good idea to repost this issue in pathway repo. Otherwise, flattening jsons could be included in the preprocessing part of a connector I am working on that handles most types of documents.

Boburmirzo commented 10 months ago

@mdmalhou Thanks! Yes, if it is relevant to Pathway, I can duplicate the issue there too.