Open sirio2013 opened 1 year ago
if you do not define character then CharacterTextSplitter
taking separator: str = '\n\n'
as Seperator.
you have to specify correct Seperator. else have to change the text splitter. plain TextSplitter
will work better if only want to split text
Dear.
From this piece of code
from langchain.document_loaders import TextLoader loader = TextLoader('cleaned_catalogue.txt') documents = loader.load()
from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents)
I keep getting chunks longer than the specified. Why?