Closed HRUSHI1212 closed 2 months ago
🤖
Hello @HRUSHI1212,
I'm here to assist you with your issue. I'm an AI bot designed to help solve bugs, answer questions, and guide you in becoming a contributor while you're waiting for a human maintainer.
Based on the information you've provided, it seems like you're encountering a 'not implemented' error when using the async chromium loader for web scraping with LangChain. This error is likely due to the use of asyncio.run()
in the lazy_load()
method of the AsyncChromiumLoader
class. The asyncio.run()
function is used to execute a coroutine and return the result, but it cannot be used when the event loop is already running.
To fix this issue, you can use the run_until_complete()
method of the event loop instead of asyncio.run()
. Here's how you can modify the lazy_load()
method:
def lazy_load(self) -> Iterator[Document]:
"""
Lazily load text content from the provided URLs.
This method yields Documents one at a time as they're scraped,
instead of waiting to scrape all URLs before returning.
Yields:
Document: The scraped content encapsulated within a Document object.
"""
loop = asyncio.get_event_loop()
for url in self.urls:
html_content = loop.run_until_complete(self.ascrape_playwright(url))
metadata = {"source": url}
yield Document(page_content=html_content, metadata=metadata)
In this modified version, asyncio.get_event_loop()
is used to get the current event loop, and then run_until_complete()
is used to run the ascrape_playwright()
coroutine until it is complete. This should resolve the 'not implemented' error.
If you're still encountering issues, it might be due to using an outdated version of LangChain or the 'playwright' module. In such a case, upgrading to the latest versions is recommended. You can install the 'playwright' module in the Python environment by executing the command pip install playwright
in the terminal. After installing the 'playwright' module, run the command playwright install
to download the necessary browser binaries that 'playwright' needs to function correctly.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
Best, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
this is also not working
@HRUSHI1212 Did you try executing with the commands pip install playwright
and then playwright install
because it worked for me with these.
i met the same question with "RuntimeError: asyncio.run() cannot be called from a running event loop" using jupyter as posted below:
what's interesting is that, i change another development environment Pycharm to load the same code ,it worked as below:
I highly suspected that Jupyter which is working on browser kernel ,would have problems using asyn method of another browser(Chromium),it has some conflicts. So when you meet this problem, you may change another development environment to have another shot.
To run asyncio in jupyter, add this bad boy to the top:
https://gist.github.com/IAmStoxe/fe3ec1f4420163cb80a974b95e7962cb
# This import is required only for jupyter notebooks, since they have their own eventloop
import nest_asyncio
nest_asyncio.apply()
@jerry1900 thanks for sharing this tip of environment :)
I was using Jupyter Notebook inside VSCode and still got NotImplementedError.
Then I move to .py script and run exact code.
It worked for me.
That's precisely how I do it. Maybe restart the kernel.
On Sat, Mar 30, 2024, 1:13 AM Karan Shingde @.***> wrote:
@jerry1900 https://github.com/jerry1900 thanks for sharing this tip of environment :)
I was using Jupyter Notebook inside VSCode and still got NotImplementedError.
Then I move to .py script and run exact code.
It worked for me.
— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/12763#issuecomment-2027965054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5UPIYPWL66E6TQZYTJIUTY2ZX4NAVCNFSM6AAAAAA62IYMQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRXHE3DKMBVGQ . You are receiving this because you commented.Message ID: @.***>
Issue you'd like to raise.
I tried to use webscraping by using langchain and used the async chromium loader as per the lanchain documentation but I was getting error asyncio.run() error so I made some chnages that i found on web but its still not working . below is the code: import asyncio import playwright from langchain.document_loaders import AsyncChromiumLoader from langchain.document_transformers import BeautifulSoupTransformer import nest_asyncio nest_asyncio.apply()
async def main():
Create an instance of AsyncChromiumLoader with a list of URLs
if name == "main": asyncio.run(main())
Suggestion:
No response