langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
45.67k stars 6.42k forks source link

No workflow output if you use more then one webscraper #3455

Closed vontainment closed 4 months ago

vontainment commented 5 months ago

Self Checks

Dify version

0.6.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Underneath the workflow results on the "preview" page is an html section <div class="markdown-body"> </div> If you make a workflow with more than one webscraper tool, the workflow progress spinner never turns to a green check and nothing is output into <div class="markdown-body"> </div> even though the workflow completed

✔️ Expected Behavior

To be able to use webscraper more than once and still have output

❌ Actual Behavior

using web scraper more then once causes workflow to get stuck before final output

dosubot[bot] commented 5 months ago

It appears the issue you're encountering with the Dify v0.6.2, specifically when using multiple instances of the webscraper tool in a single workflow, might not be directly related to a known bug or limitation in the handling of multiple tool instances. The core functionality for executing tools within workflows, managed by the ToolNode class, is designed to handle multiple tool executions, including those of the webscraper tool, without inherently limiting the number or causing the workflow to hang.

To better diagnose and address the problem, it would be helpful to have more detailed information about your workflow configuration, the specific steps you're taking when setting up your workflow with multiple webscraper tools, and any error messages or logs that are generated during the execution. Additionally, providing a minimal reproducible example of the workflow that's causing the issue could greatly assist in identifying the root cause.

This approach will help in pinpointing whether the issue lies in the specific configuration or usage of the webscraper tool within your workflows, or if it's an unidentified issue within the Dify v0.6.2 codebase that needs further investigation.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
vontainment commented 5 months ago

It is directly related to the web scraper I believe because I switched out three web scraping tools for three custom web scraping tools and it worked perfectly fine. The nature of the original workflow was You give it a research topic or question, etc. Then it asks the llm to make three search queries to find relevant information, code. Is used to add each query to its own variable then use Google searches on each each one. Then ask the llm doing all the links and pick the three most relevant. Then uses the code again to add each link to a variable. Then it would web scrape each link. And then it would ask the LLM too basically answer the original question. Then it sends it to a custom tool that creates a PDF and sends back a link to it. And then it ends with the end block outputting the URL for the PDF. But nothing outputs unless I remove two web scraping blocks or I switch out all three for a custom web scraper

Yeuoly commented 5 months ago

hello, can you share your DSL which could reproduce this issue? It works as normal in my case

image
vontainment commented 5 months ago

I'm using custom tools, maybe it's the combination

Yeuoly commented 5 months ago

Isn't it web scraper? do you mean using web scraper then custom tool?

vontainment commented 5 months ago

The workflow was running a Google search, then using the code block to extract the URLs to variables, then web scrape three different ones. Then take that information and write something up as requested. And that would run my own custom tool that submits to an API that creates a PDF and returns a link to download the file. So I ended up doing to get around the issue of only be able to use the web scraping tool once. Was I just made another custom tool using web pilot's API for web scraping and use that to web scrape three times.

vontainment commented 5 months ago

Screenshot_20240415_132714_Chrome So I circled the three custom web pilot crawlers that I used. But they were originally the actual web crawler built in. But since I couldn't use more than one of them I had to switch to something else

AndyMik90 commented 5 months ago

Screenshot_20240415_132714_Chrome So I circled the three custom web pilot crawlers that I used. But they were originally the actual web crawler built in. But since I couldn't use more than one of them I had to switch to something else

Is the "Create_PDF" a custom tool you have created? I was looking for some more formatting/output tools.

vontainment commented 5 months ago

It is a Docker container that runs a fast API with a built-in pDF creation functions.It has two functionsIt can create PDF input from theAI. The AI can submit HTML and CSS and its formatted into a PDF and a download link is returned.The other function is it can be given a web address and it will convert that to a PDF.It stores the PDFs in a download folder and the job regularly delete s. It could be easily converted into a built-in tool https://github.com/vontainment/v-gpt-pdf-generator

vontainment commented 5 months ago

I also have a few other fun ones. One uses a vector database for storage and has a function to create collections to add memories and to retrieve memories. So basically the AI can periodically insert important things that come up so it can remember them. I have another one that allowsThe AI to completely control a server.I have another one that allows complete control of email account.It can move emails delete emails replyReadso forth. Do I do need someone more python experience to maybe like tweak them a bit because I'm not like a python expert I just learned it for the AI stuff. I mean don't get me wrong with the fully functioning and so far no bugs. But I'm sure they could be optimized. Also I have found one great usage for the PDF one. I made an open API spec for the knowledge base API. So I can now have pDFs made and then put them into knowledge base or have a website converted to a PDF and putting the knowledge base

AndyMik90 commented 5 months ago

I also have a few other fun ones. One uses a vector database for storage and has a function to create collections to add memories and to retrieve memories. So basically the AI can periodically insert important things that come up so it can remember them. I have another one that allowsThe AI to completely control a server.I have another one that allows complete control of email account.It can move emails delete emails replyReadso forth. Do I do need someone more python experience to maybe like tweak them a bit because I'm not like a python expert I just learned it for the AI stuff. I mean don't get me wrong with the fully functioning and so far no bugs. But I'm sure they could be optimized. Also I have found one great usage for the PDF one. I made an open API spec for the knowledge base API. So I can now have pDFs made and then put them into knowledge base or have a website converted to a PDF and putting the knowledge base

Hit me up on Discord @ andremik and let's talk, maybe we could share some cool things

dosubot[bot] commented 4 months ago

Hi, @vontainment,

I'm helping the Dify team manage their backlog and am marking this issue as stale. From what I understand, the issue you reported is related to using multiple instances of the webscraper in a single workflow, which resulted in the workflow progress spinner not turning to a green check and no output being generated. The issue seems to be related to the web scraper, and there was a discussion about using custom web scraping tools and Docker containers for PDF creation. The issue has been resolved by switching to custom web scraping tools, and there's an offer to connect on Discord to share ideas.

Could you please confirm if this issue is still relevant to the latest version of the Dify repository? If it is, please let the Dify team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to Dify! If you have any further questions or concerns, feel free to reach out.

I