langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.23k stars 14.72k forks source link

Web Scraping with ChatOllama gives SSL: WRONG_VERSION_NUMBER #14450

Closed AndreaBrg closed 5 months ago

AndreaBrg commented 9 months ago

Issue you'd like to raise.

Hi, I'm trying to implement the web scraping tutorial using ChatOllama instead of ChatOpenAI.

This is what I'm trying to do:

import pprint

from langchain.chains import create_extraction_chain
from langchain.document_loaders import AsyncChromiumLoader
from langchain.document_transformers import BeautifulSoupTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOllama

def extract(content: str, schema: dict, llm):
    return create_extraction_chain(schema=schema, llm=llm).run(content)

def scrape_with_playwright(urls, schema, llm):
    loader = AsyncChromiumLoader(urls)
    docs = loader.load()
    bs_transformer = BeautifulSoupTransformer()
    docs_transformed = bs_transformer.transform_documents(
        docs, tags_to_extract=["span"]
    )
    print("Extracting content with LLM")

    # Grab the first 1000 tokens of the site
    splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=1000, chunk_overlap=0
    )
    splits = splitter.split_documents(docs_transformed)

    # Process the first split
    extracted_content = extract(schema=schema, content=splits[0].page_content, llm=llm)

    return extracted_content

if __name__ == '__main__':
    llm = ChatOllama(base_url="https://localhost:11434", model="llama2")

    schema = {
        "properties": {
            "news_article_title": {"type": "string"},
            "news_article_summary": {"type": "string"},
        },
        "required": ["news_article_title", "news_article_summary"],
    }

    urls = ["https://www.wsj.com"]
    extracted_content = scrape_with_playwright(urls, schema=schema, llm=llm)

    pprint.pprint(extracted_content)

Instead of the results shown I get this error: requests.exceptions.SSLError: HTTPSConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1006)'))) when the extract function is called.

Could anyone please help me understand what I'm doing wrong? Thanks!

Suggestion:

No response

dosubot[bot] commented 9 months ago

🤖

Hey @AndreaBrg! 👋 I'm Dosu, a friendly bot here to lend a helping hand while we wait for a human maintainer. I'm pretty good at squashing bugs, answering questions, and guiding contributors. So, while we wait for a human friend, let's try and tackle this together! 🤖🚀

The error you're encountering is related to the SSL connection. The error message requests.exceptions.SSLError: HTTPSConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1006)'))) indicates that the client is trying to establish a secure HTTPS connection, but the server is not configured to handle HTTPS requests, only HTTP.

In your code, the base_url for the Ollama model is set to "https://localhost:11434". This means that the Ollama model is hosted locally and is expected to communicate over HTTP, not HTTPS.

To resolve this issue, you should change the base_url from "https://localhost:11434" to "http://localhost:11434". Here's how your updated code should look:

if __name__ == '__main__':
    llm = ChatOllama(base_url="http://localhost:11434", model="llama2")
    ...

If you're still encountering issues, it might be worth checking the server's configuration or reaching out to the maintainers of the Ollama model for further assistance.

Regarding your question about LangChain's support for ChatOllama, yes, the LangChain framework does support the use of ChatOllama as a language model. This is evident from the provided context where a class named ChatOllama is defined, which inherits from BaseChatModel and _OllamaCommon. This class is designed to run large language models locally.

Here is an example of how to use it:

from langchain.chat_models import ChatOllama
ollama = ChatOllama(model="llama2")

And here is an example of how to generate a response:

response = ollama([
    HumanMessage(content="Tell me about the history of AI")
])

Please note that according to the is_lc_serializable method, this model cannot be serialized by LangChain.

I hope this helps! If you have any other questions, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

AndreaBrg commented 9 months ago

@dosu-bot thanks, changing the base_url from https://localhost:11434 to http://localhost:11434 worked.

However, it returns another error: ValueError: Ollama call failed with status code 400. Details: invalid options: function_call, functions. I noticed that function_call and functions are created by get_llm_kwargs found in openai_functions. Does this mean that web scraping can only be done using ChatOpenAI?

dosubot[bot] commented 9 months ago

🤖

Based on the information provided and the similar issues found in the LangChain repository, it seems that the ChatOllama model you're using is encountering an issue with the functions and function_call arguments. These arguments are supported by the ChatOpenAI model, but not by all models.

The get_llm_kwargs function is used to pass additional arguments to the language model used in the chain. In the context of the ChatOllama model, if it is used as the language model in an LLMChain, the llm_kwargs attribute would be used to pass additional arguments to the ChatOllama model's generate_prompt method or to bind additional arguments to the ChatOllama model before calling its batch or abatch method.

The error message you're seeing suggests that the ChatOllama model does not support the functions and function_call arguments. This is likely because these arguments are specific to the ChatOpenAI model and its function-enabled API.

To resolve this issue, you have a couple of options:

  1. If the functions and function_call arguments are not essential to your application, you can remove or modify these arguments when using the ChatOllama model.

  2. If these arguments are essential, you may need to switch to using the ChatOpenAI model, which supports these arguments.

  3. If you need to continue using the ChatOllama model and these arguments are essential, you may need to implement a similar functionality manually.

Here are the links to the similar issues for your reference:

Please try these solutions and let me know if you still encounter any issues.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

steinathan commented 8 months ago

@AndreaBrg just use OllamFunctions ???

doc: https://python.langchain.com/docs/integrations/chat/ollama_functions

$ poetry add langchain_experimental
from langchain_experimental.llms.ollama_functions import OllamaFunctions

llm = OllamaFunctions(
    model="llama2",
)
schema = {
        "properties": {
            "news_article_title": {"type": "string"},
            "news_article_summary": {"type": "string"},
        },
        "required": ["news_article_title", "news_article_summary"],
    }

urls = ["https://www.wsj.com"]
extracted_content = scrape_with_playwright(urls, schema=schema, llm=llm)
pprint.pprint(extracted_content)
dosubot[bot] commented 5 months ago

Hi, @AndreaBrg

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you encountered an SSL error when attempting web scraping using ChatOllama, and after changing the base_url from https://localhost:11434 to http://localhost:11434, a new error, ValueError: Ollama call failed with status code 400. Details: invalid options: function_call, functions, was encountered. Dosu-bot provided guidance, suggesting that the functions and function_call arguments are specific to the ChatOpenAI model and may not be supported by the ChatOllama model. Another user, navicstein, recommended using OllamaFunctions from the langchain_experimental package as a potential solution.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

812781385 commented 2 months ago

OllamaFunctions using bind_tools and enabling streams throws an error: ValueError: Ollama call failed with status code 400. Details: {"error":"invalid options: functions"}

llmmodel = OllamaFunctions(model=model, base_url='http://localhost:11434', format="json") llmmodel = llmmodel.bind_tools( tools=[ { "name": "get_ship", "description": "get ship info", "parameters": { "type": "object", "properties": { "mmsi": { "type": "string", "description": "mmsi, " "ps: 13324313", } }, "required": ["mmsi"], }, }, ], function_call={"name": "get_ship"}, ) response = llmmodel.stream(prompt)