run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.42k stars 727 forks source link

[Bug]: Error fetching content from (https://www.healthline.com/health/treat-pimple-on-neck#prevention): string indices must be integers #959

Closed andysingal closed 4 months ago

andysingal commented 4 months ago

Bug Description

while trying to use TrafilaturaWebReader().load_data not able to extract content

Version

0.10.13

Steps to Reproduce

query_engine_tools = []
for article in articles:
            try:
              article_content = TrafilaturaWebReader().load_data(article['url'])
            except Exception as e:
              print(f"Error fetching content from {article}: {e}")
              continue

            embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

            splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model
)
            nodes = splitter.get_nodes_from_documents(
    documents=article,
    show_progress=True,
)
            query_engine = VectorStoreIndex(
    nodes,
    embed_model = embed_model,
    use_async=True,
    show_progress=True
).as_query_engine(llm=llm)
            query_engine_tool = QueryEngineTool(
                query_engine=query_engine,
                metadata=ToolMetadata(
                    description=(
                        f"Provides information about the U.S. government financial report "
                    ),
                ),
            )
            query_engine_tools.append(query_engine_tool)

sub_question_query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)            

gives error:

Error fetching content from https://www.healthline.com/health/treat-pimple-on-neck#prevention: string indices must be integers
Error fetching content from https://www.healthline.com/health/proven-skincare#when-to-see-a-doctor: string indices must be integers
Error fetching content from https://www.healthline.com/health/custom-skin-care#essentials: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health-news/microneedling-latest-craze-in-skin-care#The-Vampire-Facelift: string indices must be integers
Error fetching content from https://www.healthline.com/health-news/margot-robbie-and-barbie-cast-drank-milk-thistle-tea-for-glowing-skin-does-it-work#Additional-health-benefits-of-milk-thistle-: string indices must be integers
Error fetching content from https://www.healthline.com/health-news/everything-shower-tiktok-trend#How-Everything-Showers-can-benefit-your-mind-and-body: string indices must be integers
Error fetching content from https://www.healthline.com/health/skin/how-to-do-a-skin-care-self-exam: string indices must be integers
Error fetching content from https://www.healthline.com/health/pregnancy/pregnancy-safe-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health/eczema/eczema-skin-care-routine: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/skin-care-tips: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/skin-care-routine-for-oily-skin: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/skin-care-routine-for-dry-skin: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/skin-care-routine-for-combination-skin: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/products-not-to-use-on-face: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/order-of-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/natural-skin-care-routine: string indices must be integers
Error fetching content from https://www.healthline.com/health/sodium-hydroxide-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health/rosacea/skin-care-for-rosacea: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skincare/propylene-glycol-in-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/editors-pick-products-for-oily-skin: string indices must be integers
Error fetching content from https://www.healthline.com/vitamins-supplements/vitamins-for-skin: string indices must be integers

Relevant Logs/Tracbacks

Error fetching content from https://www.healthline.com/health/treat-pimple-on-neck#prevention: string indices must be integers
Error fetching content from https://www.healthline.com/health/proven-skincare#when-to-see-a-doctor: string indices must be integers
Error fetching content from https://www.healthline.com/health/custom-skin-care#essentials: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health-news/microneedling-latest-craze-in-skin-care#The-Vampire-Facelift: string indices must be integers
Error fetching content from https://www.healthline.com/health-news/margot-robbie-and-barbie-cast-drank-milk-thistle-tea-for-glowing-skin-does-it-work#Additional-health-benefits-of-milk-thistle-: string indices must be integers
Error fetching content from https://www.healthline.com/health-news/everything-shower-tiktok-trend#How-Everything-Showers-can-benefit-your-mind-and-body: string indices must be integers
Error fetching content from https://www.healthline.com/health/skin/how-to-do-a-skin-care-self-exam: string indices must be integers
Error fetching content from https://www.healthline.com/health/pregnancy/pregnancy-safe-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health/eczema/eczema-skin-care-routine: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/skin-care-tips: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/skin-care-routine-for-oily-skin: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/skin-care-routine-for-dry-skin: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/skin-care-routine-for-combination-skin: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/products-not-to-use-on-face: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/order-of-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/natural-skin-care-routine: string indices must be integers
Error fetching content from https://www.healthline.com/health/sodium-hydroxide-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health/rosacea/skin-care-for-rosacea: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skincare/propylene-glycol-in-skin-care: string indices must be integers
Error fetching content from https://www.healthline.com/health/beauty-skin-care/editors-pick-products-for-oily-skin: string indices must be integers
Error fetching content from https://www.healthline.com/vitamins-supplements/vitamins-for-skin: string indices must be integers