transpect / docx2tex

Converts Microsoft Word docx to LaTeX
BSD 2-Clause "Simplified" License
523 stars 47 forks source link

Suggestion for image names and format #84

Open gsfs opened 3 years ago

gsfs commented 3 years ago

This is a suggestion for an improvement in this great piece of software. In addition to the actual images in a Word file, the program also converts all equations and other math-mode text into images. The latter are not included in any way in the TeX output while the former are referenced in includegraphics environments. If it is simple to give those "real" images a different name, perhaps adding a prefix, that would make it much easier to copy them out of what can be a long list of files in the .docx/tmp/word/media directory. In a Word file with lots of images, it gets a bit tedious to scroll down through the .tex file to find all the includegraphics commands and then find and copy the referenced image file.

This is likely to be a harder request, but it would be helpful if there was an option for the images to be in a more Mac friendly format, rather than emf.

gimsieke commented 3 years ago

I don’t think that docx2tex converts equations into images. Sometimes MathType embeds its equations in images, and these won’t be referenced by \includegraphics in the output. New-style docx-native equations, also called OMML equations, however, are converted to MathML and to TeX math without them being also rendered as images. But maybe I’m missing something. Would it help if the tool emitted a list of non-MathType images as a text file?

mkraetke commented 3 years ago

Older MathType versions prior to 5.0 are not converted by our MathType converter. In this case, only the preview graphic is embedded as \includegraphic. However, it would be possible to differentitate those images by providing a certain prefix, e.g. mt_. Would you provide a sample docx file?

gsfs commented 3 years ago

As an example, consider this file: test_image.docx Processing it creates files test_image.docx.tmp/word/media/image1.emf to image8.emf

image3.emf is referenced in \includegraphics in the .tex output: test_image.tex.zip It would be useful if "real" images like this had a prefix like the suggested mt_ so that they could be easily found and copied to another directory for use with the .tex output.

The other 7 images are graphics versions of the math-mode symbols in the text. In the .tex output, these are correctly converted to $blah blah$. I don't know if there is some file where these other images are referenced, and I cannot think of any use for them. However, maybe I missed something.

jinhangli commented 9 months ago

hello, I use the same docx with @gsfs , but in the tex file, I only get:

\textbf{\textnormal{Express your answers to the following questions in terms of} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image4.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject3.bin}\textnormal{,} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image5.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject4.bin}\textnormal{,} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image6.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject5.bin}\textnormal{,} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image7.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject6.bin}\textnormal{, and} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image8.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject7.bin}\textnormal{.} } I would like to have the equation correctly converted into the math version. Could anyone give me some help? I use the latest version of the code and tried both Linux and windows versions. Thanks a lot. @gimsieke @mkraetke

gsfs commented 9 months ago

I'm not sure what this is in reply to.

On Jan 8, 2024, at 5:20 :09AM, jinhangli @.***> wrote:

hello, I use the same docx with @gsfs , but in the tex file, I only get:

\textbf{\textnormal{Express your answers to the following questions in terms of} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image4.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject3.bin}\textnormal{,} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image5.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject4.bin}\textnormal{,} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image6.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject5.bin}\textnormal{,} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image7.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject6.bin}\textnormal{, and} \includegraphics[width=1\textwidth]{test_image.docx.tmp/word/media/image8.emf}\includegraphics[width=1\textwidth]{embeddings/oleObject7.bin}\textnormal{.} } I would like to have the equation correctly converted into the math version. Could anyone give me some help? I use the latest version of the code and tried both Linux and windows versions. Thanks a lot. @gimsieke @mkraetke

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.