langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
91.97k stars 14.63k forks source link

DOC: AsyncChromiumLoader instructions do not work in Windows Jupyter notebook #21246

Closed mieslep closed 3 weeks ago

mieslep commented 4 months ago

Checklist

Issue with current documentation:

On this page:

https://python.langchain.com/docs/integrations/document_loaders/async_chromium/

with a modified notebook cell:

from langchain_community.document_loaders import AsyncChromiumLoader
import nest_asyncio
nest_asyncio.apply()

urls = ["https://www.wsj.com"]
loader = AsyncChromiumLoader(urls)
docs = loader.load()
docs[0].page_content[0:100]

I get this stacktrace:

Task exception was never retrieved
future: <Task finished name='Task-19' coro=<Connection.run() done, defined at c:\Users\phil\git\graphvec\.venv\Lib\site-packages\playwright\_impl\_connection.py:265> exception=NotImplementedError()>
Traceback (most recent call last):
  File "C:\Users\phil\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "c:\Users\phil\git\graphvec\.venv\Lib\site-packages\playwright\_impl\_connection.py", line 272, in run
    await self._transport.connect()
  File "c:\Users\phil\git\graphvec\.venv\Lib\site-packages\playwright\_impl\_transport.py", line 133, in connect
    raise exc
  File "c:\Users\phil\git\graphvec\.venv\Lib\site-packages\playwright\_impl\_transport.py", line 120, in connect
    self._proc = await asyncio.create_subprocess_exec(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\phil\AppData\Local\Programs\Python\Python311\Lib\asyncio\subprocess.py", line 223, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\phil\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 1708, in subprocess_exec
    transport = await self._make_subprocess_transport(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\phil\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 503, in _make_subprocess_transport
    raise NotImplementedError
NotImplementedError

From some internet sleuthing it seems this is a problem specific to Windows?

If I put the code into a .py file and run it directly it does run correctly, so the environment is installed correctly, but it is a Jupyter-related invocation problem.

Idea or request for content:

eyurtsev commented 4 months ago

The chromium loader likely needs to be re-written to support true async. It's relying right now on nest_asyncio which might not be supporting windows (and it appears to be an archived project now)