nameiwillforget / hegel-in-mathematics

My exposition of the formalization of Hegel's theory in modal homotopy type theory
GNU General Public License v3.0
12 stars 1 forks source link

Script to introduce line breaks in textfile with only long sentences #4

Closed Nikolaj-K closed 2 years ago

Nikolaj-K commented 2 years ago

You had asked me for the script used to produce the line breaks, so here you have it below.

(After it's been applied, I don't think it makes sense to run it another time, so I don't think it necessarily makes sense to have it as textfile in the repo itself.)

Note: With blanks in TeX code, such as "hacks" like $a\ b$, there's about a 1 in 20 chance this script makes the break there. While not wrong it would look awkward and for your text you can still manually prettify it after. Btw., in case you don't know, one can also correct kerning to a slightly more tighter degree via $a\,b$.

# python run_line_breaker.py
import csv

class Config:
    CHARS_PER_SENTENCE_THRESHOLD: int = 100
    SOURCE_TEX_FILEPATH: str = "./main.tex"
    TARGET_TEX_FILEPATH: str = "./main_EDITED.tex" 
    # Add fullpaths instead of "./foo" if you don't call this script from inside the folder

_DELIMITER: str = " "  # Gonna treat the file as comma separated value file with " " splitting words

def shall_break(line_so_far: list) -> bool:
    sentence_so_far: str = _DELIMITER.join(line_so_far)
    return len(sentence_so_far) > Config.CHARS_PER_SENTENCE_THRESHOLD

def yield_target_lines(source_line: list):
    tl = []
    for word in source_line:
        tl.append(word)
        if shall_break(tl):
            yield tl
            tl = []  # Reset
    if not tl == [""]:  # I saw some emptystring sentences in the original file
        yield tl  # Yield the rest of sentence

if __name__=="__main__":
    with open(Config.SOURCE_TEX_FILEPATH) as in_file:
        with open(Config.TARGET_TEX_FILEPATH, 'w') as out_file:
            writer = csv.writer(out_file, delimiter=_DELIMITER)
            for source_line in csv.reader(in_file, delimiter=_DELIMITER):
                for target_line in yield_target_lines(source_line):
                    writer.writerow(target_line)
nameiwillforget commented 2 years ago

You had asked me for the script used to produce the line breaks, so here you have it below.

(After it's been applied, I don't think it makes sense to run it another time, so I don't think it necessarily makes sense to have it as textfile in the repo itself.)

Note: With blanks in TeX code, such as "hacks" like $a\ b$, there's about a 1 in 20 chance this script makes the break there. While not wrong it would look awkward and for your text you can still manually prettify it after. Btw., in case you don't know, one can also correct kerning to a slightly more tighter degree via $a\,b$.

# python run_line_breaker.py
import csv

class Config:
    CHARS_PER_SENTENCE_THRESHOLD: int = 100
    SOURCE_TEX_FILEPATH: str = "./main.tex"
    TARGET_TEX_FILEPATH: str = "./main_EDITED.tex" 
    # Add fullpaths instead of "./foo" if you don't call this script from inside the folder

_DELIMITER: str = " "  # Gonna treat the file as comma separated value file with " " splitting words

def shall_break(line_so_far: list) -> bool:
    sentence_so_far: str = _DELIMITER.join(line_so_far)
    return len(sentence_so_far) > Config.CHARS_PER_SENTENCE_THRESHOLD

def yield_target_lines(source_line: list):
    tl = []
    for word in source_line:
        tl.append(word)
        if shall_break(tl):
            yield tl
            tl = []  # Reset
    if not tl == [""]:  # I saw some emptystring sentences in the original file
        yield tl  # Yield the rest of sentence

if __name__=="__main__":
    with open(Config.SOURCE_TEX_FILEPATH) as in_file:
        with open(Config.TARGET_TEX_FILEPATH, 'w') as out_file:
            writer = csv.writer(out_file, delimiter=_DELIMITER)
            for source_line in csv.reader(in_file, delimiter=_DELIMITER):
                for target_line in yield_target_lines(source_line):
                    writer.writerow(target_line)

Very nice, thanks!