siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
91 stars 24 forks source link

Conversion from TeX to plain text #125

Open batchor opened 1 week ago

batchor commented 1 week ago

Is there a way to convert the TeX markup into plain text as much as possible? Just something like pandoc example.tex -o output.txt.

I saw there are conversions to markdown, json, along with other formats in the playground (https://siefkenj.github.io/latex-parser-playground/).

I'm implementing grammar check for TeX documents and I want the TeX markups to be removed because they interfere the checking results.

siefkenj commented 1 week ago

You can look at the to-markdown plugin for what it looks like to unwrap strings. I don't know if you need line information from the original or not, but it shouldn't be too hard to unwrap everything. You'll have to decide what macro arguments you keep. Like \foo{my thing}{other thing}. Do you want that to turn to my thingother thing? Or something else...

batchor commented 1 week ago

You can look at the to-markdown plugin for what it looks like to unwrap strings.

Thanks, I'll definitely look into to-markdown.

For the macros, I think it really depends. Like \color{red}{my text}, I want it to be my text.

For \usepackage[xxx]{yyy}, I hope it can be entirely ignored.