run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.78k stars 5.27k forks source link

[Bug]: SentenceSplitter_split crashes on a sequence of 10 MB lower letters #11341

Closed preemoDez closed 5 months ago

preemoDez commented 8 months ago

Bug Description

output-onlinefiletools.txt I tried to split the above file, using Llama Index Python. In #10554 I mentioned that this whole algorithm is $O(n^2)$, however, in this particular case it looks like the tokenizer causes the stack overflow: token_size = self._token_size(text).

A small issue is that self._token_size(text) is called twice:

token_size = self._token_size(text)
if self._token_size(text) <= chunk_size:

In the lower if, you can reuse the token_size calculated above.

Version

0.9.39

Steps to Reproduce

reader = SimpleDirectoryReader(input_files=[filepath])
documents = reader.load_data()
splitter = SentenceSplitter.from_defaults(
    chunk_overlap=32,
    chunk_size=256,
)
llama_nodes = splitter.get_nodes_from_documents(documents=documents)

Relevant Logs/Tracbacks

thread '<unnamed>' panicked at src/lib.rs:250:33:
called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-02-23 16:22:16.238 | ERROR    | uvicorn.protocols.http.httptools_impl:run_asgi:431 - Exception in ASGI application

Traceback (most recent call last):

  File "<string>", line 1, in <module>
  File "/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               │     │   └ 4
               │     └ 7
               └ <function _main at 0x105545cf0>
  File "/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/multiprocessing/spawn.py", line 129, in _main
    return self._bootstrap(parent_sentinel)
           │    │          └ 4
           │    └ <function BaseProcess._bootstrap at 0x10545bb50>
           └ <SpawnProcess name='SpawnProcess-43' parent=2850 started>
  File "/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x10545b1c0>
    └ <SpawnProcess name='SpawnProcess-43' parent=2850 started>
  File "/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {'config': <uvicorn.config.Config object at 0x1055276d0>, 'target': <bound method Server.run of <uvicorn.server.Server object...
    │    │        │    │        └ <SpawnProcess name='SpawnProcess-43' parent=2850 started>
    │    │        │    └ ()
    │    │        └ <SpawnProcess name='SpawnProcess-43' parent=2850 started>
    │    └ <function subprocess_started at 0x1072e7eb0>
    └ <SpawnProcess name='SpawnProcess-43' parent=2850 started>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
    target(sockets=sockets)
    │              └ [<socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 8028)>]
    └ <bound method Server.run of <uvicorn.server.Server object at 0x105527730>>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/uvicorn/server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
           │       │   │    │             └ [<socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 8028)>]
           │       │   │    └ <function Server.serve at 0x1072e7400>
           │       │   └ <uvicorn.server.Server object at 0x105527730>
           │       └ <function run at 0x1070820e0>
           └ <module 'asyncio' from '/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/asyncio/__init__.py'>
  File "/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
           │    │                  └ <coroutine object Server.serve at 0x107822030>
           │    └ <method 'run_until_complete' of 'uvloop.loop.Loop' objects>
           └ <uvloop.Loop running=True closed=False debug=False>
> File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
                   └ <uvicorn.middleware.proxy_headers.ProxyHeadersMiddleware object at 0x10788d360>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
                 │    │   │      │        └ <bound method RequestResponseCycle.send of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb00>>
                 │    │   │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
                 │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
                 │    └ <fastapi.applications.FastAPI object at 0x1079aa9b0>
                 └ <uvicorn.middleware.proxy_headers.ProxyHeadersMiddleware object at 0x10788d360>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
                           │      │        └ <bound method RequestResponseCycle.send of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb00>>
                           │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
                           └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
          │    │                │      │        └ <bound method RequestResponseCycle.send of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb00>>
          │    │                │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │    │                └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          │    └ <starlette.middleware.errors.ServerErrorMiddleware object at 0x12955e5c0>
          └ <fastapi.applications.FastAPI object at 0x1079aa9b0>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
          │    │   │      │        └ <function ServerErrorMiddleware.__call__.<locals>._send at 0x1295479a0>
          │    │   │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          │    └ <starlette.middleware.exceptions.ExceptionMiddleware object at 0x12955e590>
          └ <starlette.middleware.errors.ServerErrorMiddleware object at 0x12955e5c0>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
          │                            │    │    │     │      │        └ <function ServerErrorMiddleware.__call__.<locals>._send at 0x1295479a0>
          │                            │    │    │     │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │                            │    │    │     └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          │                            │    │    └ <starlette.requests.Request object at 0x12955ec50>
          │                            │    └ <fastapi.routing.APIRouter object at 0x12955c460>
          │                            └ <starlette.middleware.exceptions.ExceptionMiddleware object at 0x12955e590>
          └ <function wrap_app_handling_exceptions at 0x108921900>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
          │   │      │        └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x129547ac0>
          │   │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          └ <fastapi.routing.APIRouter object at 0x12955c460>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/routing.py", line 762, in __call__
    await self.middleware_stack(scope, receive, send)
          │    │                │      │        └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x129547ac0>
          │    │                │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │    │                └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          │    └ <bound method Router.app of <fastapi.routing.APIRouter object at 0x12955c460>>
          └ <fastapi.routing.APIRouter object at 0x12955c460>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/routing.py", line 782, in app
    await route.handle(scope, receive, send)
          │     │      │      │        └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x129547ac0>
          │     │      │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │     │      └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          │     └ <function Route.handle at 0x108922d40>
          └ APIRoute(path='/generate_nodes', name='handle', methods=['POST'])
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
          │    │   │      │        └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x129547ac0>
          │    │   │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │    │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          │    └ <function request_response.<locals>.app at 0x1295475b0>
          └ APIRoute(path='/generate_nodes', name='handle', methods=['POST'])
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
          │                            │    │        │      │        └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x129547ac0>
          │                            │    │        │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │                            │    │        └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          │                            │    └ <starlette.requests.Request object at 0x12955eec0>
          │                            └ <function request_response.<locals>.app.<locals>.app at 0x129547910>
          └ <function wrap_app_handling_exceptions at 0x108921900>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
          │   │      │        └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x129547b50>
          │   │      └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.httptools_impl.RequestResponseCycle object at 0x12955eb...
          │   └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 8028), 'cl...
          └ <function request_response.<locals>.app.<locals>.app at 0x129547910>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
                     │    └ <starlette.requests.Request object at 0x12955eec0>
                     └ <function get_request_handler.<locals>.app at 0x129547490>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
                         └ <function run_endpoint_function at 0x108923be0>
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
                 │         │      └ {'generator': <document_processing_service.processor.node_generator.NodeGenerator object at 0x108b24280>, 'request': NodeExtr...
                 │         └ <function handle at 0x129546e60>
                 └ <fastapi.dependencies.models.Dependant object at 0x12955c5e0>

  File "/Users/dez/git/monorepo/document_processing_service/document_processing_service/routers/nodes.py", line 63, in handle
    response = await generator.get_chunked_text_from_artifact_async(request)
                     │         │                                    └ NodeExtractorRequest(artifact_id='be35e478-26a3-452b-b9c0-df62c5c89d00_artifact', workspace_id='5a6c5871-0138-4442-8cd9-ba13a...
                     │         └ <function NodeGenerator.get_chunked_text_from_artifact_async at 0x1294cfac0>
                     └ <document_processing_service.processor.node_generator.NodeGenerator object at 0x108b24280>

  File "/Users/dez/git/monorepo/document_processing_service/document_processing_service/processor/node_generator.py", line 343, in get_chunked_text_from_artifact_async
    return await asyncio.to_thread(
                 │       └ <function to_thread at 0x107099cf0>
                 └ <module 'asyncio' from '/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/asyncio/__init__.py'>

  File "/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
                 │    │                     └ functools.partial(<built-in method run of _contextvars.Context object at 0x1295bcd80>, <bound method NodeGenerator.get_chunke...
                 │    └ <method 'run_in_executor' of 'uvloop.loop.Loop' objects>
                 └ <uvloop.Loop running=True closed=False debug=False>
  File "/nix/store/fcdizvgrhss6yw5p0hm37423i2h4g53f-python3-3.10.12/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             │        │            └ None
             │        └ None
             └ None

  File "/Users/dez/git/monorepo/document_processing_service/document_processing_service/processor/node_generator.py", line 315, in get_chunked_text_from_artifact
    llama_nodes = self.generate_nodes_from_documents(
                  │    └ <staticmethod(<function NodeGenerator.generate_nodes_from_documents at 0x1294cf910>)>
                  └ <document_processing_service.processor.node_generator.NodeGenerator object at 0x108b24280>

  File "/Users/dez/git/monorepo/document_processing_service/document_processing_service/processor/node_generator.py", line 225, in generate_nodes_from_documents
    llama_nodes = splitter.get_nodes_from_documents(documents=documents)
                  │        │                                  └ [Document(id_='c8795746-380a-453f-8b45-8dbf2ac756ca', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_...
                  │        └ <function NodeParser.get_nodes_from_documents at 0x128b2a440>
                  └ SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackMana...

  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/llama_index/node_parser/interface.py", line 72, in get_nodes_from_documents
    nodes = self._parse_nodes(documents, show_progress=show_progress, **kwargs)
            │    │            │                        │                └ {}
            │    │            │                        └ False
            │    │            └ [Document(id_='c8795746-380a-453f-8b45-8dbf2ac756ca', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_...
            │    └ <function MetadataAwareTextSplitter._parse_nodes at 0x128b2a950>
            └ SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackMana...
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/llama_index/node_parser/interface.py", line 199, in _parse_nodes
    splits = self.split_text_metadata_aware(
             │    └ <function SentenceSplitter.split_text_metadata_aware at 0x128b99090>
             └ SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackMana...
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/llama_index/node_parser/text/sentence.py", line 168, in split_text_metadata_aware
    return self._split_text(text, chunk_size=effective_chunk_size)
           │    │           │                └ 256
           │    │           └ 'gpqhecdkvajcfdnarzvypcpmfpndxuzxlltpwetokgeexmzijshpzddtgjsozhlyllblmfseesbcrbhkbrojrxerlqkamqwxjnfcnvlqjajoxcczzazfkoyrcrkq...
           │    └ <function SentenceSplitter._split_text at 0x128b991b0>
           └ SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackMana...
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/llama_index/node_parser/text/sentence.py", line 186, in _split_text
    splits = self._split(text, chunk_size)
             │    │      │     └ 256
             │    │      └ 'gpqhecdkvajcfdnarzvypcpmfpndxuzxlltpwetokgeexmzijshpzddtgjsozhlyllblmfseesbcrbhkbrojrxerlqkamqwxjnfcnvlqjajoxcczzazfkoyrcrkq...
             │    └ <function SentenceSplitter._split at 0x128b99240>
             └ SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackMana...
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/llama_index/node_parser/text/sentence.py", line 214, in _split
    token_size = self._token_size(text)
                 │    │           └ 'gpqhecdkvajcfdnarzvypcpmfpndxuzxlltpwetokgeexmzijshpzddtgjsozhlyllblmfseesbcrbhkbrojrxerlqkamqwxjnfcnvlqjajoxcczzazfkoyrcrkq...
                 │    └ <function SentenceSplitter._token_size at 0x128b993f0>
                 └ SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackMana...
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/llama_index/node_parser/text/sentence.py", line 317, in _token_size
    return len(self._tokenizer(text))
               │    │          └ 'gpqhecdkvajcfdnarzvypcpmfpndxuzxlltpwetokgeexmzijshpzddtgjsozhlyllblmfseesbcrbhkbrojrxerlqkamqwxjnfcnvlqjajoxcczzazfkoyrcrkq...
               │    └ <member '_tokenizer' of 'SentenceSplitter' objects>
               └ SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackMana...
  File "/Users/dez/git/monorepo/document_processing_service/.venv/lib/python3.10/site-packages/tiktoken/core.py", line 120, in encode
    return self._core_bpe.encode(text, allowed_special)
           │    │         │      │     └ {'<|fim_suffix|>', '<|endofprompt|>', '<|fim_prefix|>', '<|endoftext|>', '<|fim_middle|>'}
           │    │         │      └ 'gpqhecdkvajcfdnarzvypcpmfpndxuzxlltpwetokgeexmzijshpzddtgjsozhlyllblmfseesbcrbhkbrojrxerlqkamqwxjnfcnvlqjajoxcczzazfkoyrcrkq...
           │    │         └ <method 'encode' of 'builtins.CoreBPE' objects>
           │    └ <builtins.CoreBPE object at 0x129552170>
           └ <Encoding 'cl100k_base'>

pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)
dosubot[bot] commented 8 months ago

🤖

Hey @preemoDez, great to see you diving deep again! 🚀

Based on the traceback you provided, it seems like the RuntimeError(StackOverflow) is being caused by a recursive call in the _split method of the SentenceSplitter class. If the text being split is too large or complex, it could lead to a stack overflow due to the depth of the recursive calls.

Here is the relevant code from the SentenceSplitter class:

def _split(self, text: str, chunk_size: int) -> List[str]:
    """Break text into splits that are smaller than chunk size."""
    if len(self._tokenizer(text)) <= chunk_size:
        return [text]

    for split_fn in self._split_fns:
        splits = split_fn(text)
        if len(splits) > 1:
            break

    new_splits = []
    for split in splits:
        split_len = len(self._tokenizer(split))
        if split_len <= chunk_size:
            new_splits.append(split)
        else:
            # recursively split
            new_splits.extend(self._split(split, chunk_size=chunk_size))
    return new_splits

To resolve this issue, you could refactor the _split method to use an iterative approach instead of a recursive one. This would prevent the stack overflow error by avoiding the deep call stack that comes with recursion. However, this might require a significant change in the logic of the method, and you would need to ensure that the new implementation still correctly splits the text as required.

As for the SentenceSplitter class, it is used to split a given text into smaller chunks or sentences. It takes in parameters such as chunk_size and chunk_overlap to determine the size of each chunk and the overlap between them. The split_text method is used to split a single text into chunks, while the get_nodes_from_documents method is used to get nodes from documents, where each node has a start and end character index.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.