Closed Ryokki closed 3 weeks ago
Markdown is basically text, so it should not be a problem to remove multiple new lines
Thank you so much for providing such a great tool! @darkcheftar You are right, it's not a tricky problem I would greatly appreciate it if this feature could be built in ~
import re
import sys
def fix_markdown_linebreaks(input_file, output_file):
with open(input_file, 'r', encoding='utf-8') as f:
content = f.read()
fixed_content = re.sub(r'([^\n])\n(?![\n\s#-])', r'\1 ', content)
with open(output_file, 'w', encoding='utf-8') as f:
f.write(fixed_content)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("usage: python script.py <input_file> <output_file>")
sys.exit(1)
input_file = sys.argv[1]
output_file = sys.argv[2]
fix_markdown_linebreaks(input_file, output_file)
print('done!')
Hey @Ryokki, Thanks for @jzillmann he is the actual owner for the tool, I just love to help. And hopefully I am helpful.
When a PDF breaks to a new line, Markdown will also break to a new line. Is there any configuration to change this behavior? I want the text to be continuous.