Open XariZaru opened 3 months ago
We are hitting this exact bug as well, the conversion of hyphens into newlines in links. Thank you for opening up this bug!
Hi there! We are a group of 3 students from the University of Toronto and we are very interested in fixing this issue and also adding some tests. We will submit a PR for this issue by end of November. Thank you!
Checked other resources
Example Code
I use the following code to load my documents.
Error Message and Stack Trace (if applicable)
No response
Description
The following is a line from a text document I am loading. This is how it looks in Notepad. Document Name: https://www.kinecta.org//about-us/executive-staff
When I load the document using DirectoryLoader (I load a list of other docs as well), and print out the doc.page_content, I get the following:
page_content='Document Name: https://www.kinecta.org//about\n\nus/executive\n\nstaff\n\n'
As you can see, it converted the dashes into new line characters. Any idea what this is?
This is the code I use to load my documents.
System Info
Python 3.11 Langchain 0.1.12