Open MonoMarkor opened 3 months ago
i just found the exact same issue that was closed, but i cant get the solution to work it is here: https://github.com/langchain-ai/langchain/issues/11853
currently im using this and it is working but its not the best:
if keep_markdown_format:
content=content.replace("</p>", "\n</p>").replace("<br />", "\n")
text = markdownify(content, heading_style="ATX") + "".join(attachment_texts)
Checked other resources
Example Code
hello, when i load a conflunce page using the confluence loader, i have noticed a wierd formatting that happens inside a cell of a table when there is a text on multiple lines. when using 'keep_markdown_format = True'
That is, when there are multiple p tags inside a cell of a table
, there is nothing seperating the information that is present inside the tags. No space or new line.When I tried using keep_markdown_format = False, then the texts inside the table was formatted with a space in between which is good.
I want to keep 'keep_markdown_format = True', is there a way to solve this?
Error Message and Stack Trace (if applicable)
No response
Description
This is a piece of text
This is the second line inside the cell of a table
When using keep_markdown_format = True i get; This is a piece of textThis is the second line inside the cell of a table
When using keep_markdown_format = False i get; This is a piece of text This is the second line inside the cell of a table
As you can see when i set it to True there is nothing seperating multiple lines inside a table
System Info
langchain==0.2.12 langchain-community==0.2.11 Windows 11 Python 3.11.0