Closed pro-arch-user closed 1 month ago
Thanks for reporting this issue! I can reproduce the issue on my end. Looks like pandocomatic tries to recognize YAML metadata blocks in a non-markdown / non-plain text format, and fails.
I haven't used DOCX as an input format before, so I didn't run into this problem. It seems likely that the issue also occurs for other non-plain text input formats. Anyway, pandocomatic should support any input format, so I'll investigate the issue to come up with a fix.
Yeah thanks man. Maybe I should add some kinda "ignore metadata" param my template?
Btw do you have a batch/bash script that recursively iterates over all files in a directory and performs a command on them? Because that way I could just do if (fileextention = .docx) pandoc file_name -f docx -t markdown -o file_name
for each file.
Something like
find . -name *.docx -print0 | xargs -0 -I{} pandoc "{}" -f docx -t markdown -o "{}.md"
might work?
But be careful, if you forget the ".md" in "{}.md", your original files might get overwritten. Maybe apply this command on a copy of your files instead of the originals. Just to be safe.
Maybe I should add some kinda "ignore metadata" param my template?
That doesn't exists, at the moment. Might be a nice feature to have: Have a template setting to skip looking inside files for detailed pandoc and pandocomatic configuration.
Maybe I should add some kinda "ignore metadata" param my template?
That doesn't exists, at the moment. Might be a nice feature to have: Have a template setting to skip looking inside files for detailed pandoc and pandocomatic configuration.
It would be cool if you add this
Something like
find . -name *.docx -print0 | xargs -0 -I{} pandoc "{}" -f docx -t markdown -o "{}.md"
might work?
I am too dumb for that shit haha
But be careful, if you forget the ".md" in "{}.md", your original files might get overwritten. Maybe apply this command on a copy of your files instead of the originals. Just to be safe.
Yeah I have backups including a copy on a usb stick so it should be good
I ended up making my own script https://github.com/pro-arch-user/Pandoc-Directory-Convert
I've been looking into the issue and discovered that more things go awry when using DOCX, or any non-plain text input format with pandocomatic. I seem to have build pandocomatic around the implicit assumption that we convert only plain text source files.
I will look into this further, but expect a solution to take a while.
I've been looking into the issue and discovered that more things go awry when using DOCX, or any non-plain text input format with pandocomatic. I seem to have build pandocomatic around the implicit assumption that we convert only plain text source files.
I will look into this further, but expect a solution to take a while.
Docx is ass. I used my script and finally switched my notes to obsidian. So much better now.
Fixed issue. Will be in next version of pandocomatic (1.2.0), but release will wait until I've made more changes.
If you want to test before release is published, checkout master branch, and use "test/pandocomatic.rb" as the pandocomatic program. I.e., to run the scenario reported in this ticket, run:
/path/you/cloned/pandocomatc/repo/test/pandocomatic.rb -c .\config.yaml -o output_dir -i test
I've fixed issue by only extracting pandoc metadata YAML blocks from markdown files. If pandocomatic doesn't yet know a file's source format, it uses pandoc's default mapping from file extension to source format. In either case, this'd mean that DOCX files will not be mined for pandoc YAML metadata blocks.
In case you use an uncommon file extension for your markdown files, you can use setting extract-metadata-from
in your pandocomatic configuration files to tell pandocomatic to also extract pandoc metadata YAML blocks from these files. For example, if you call your markdown files "my_document.pandoc", you can configure:
settings:
# ...
extract-metadata-from: ['*.pandoc']
# ...
Note that this configuration does not override or disable extracting pandoc metadata YAML blocks from markdown files recognized as such by pandocomatic or pandoc. I.e., all files names "*.md" will still be mined for metadata.
I'm trying to use the command "pandocomatic -c .\config.yaml -o output_dir -i test". The directory "test" has a few .docx files. Here is my "config.yaml": settings: recursive: true follow-symlinks: false skip: ['.', 'pandocomatic.yaml'] match-files: 'first' templates: templatew: glob: ['.docx']
When I run this it throws an error: [UNEXPECTED ERROR] An unexpected error has occurred. You can report this bug via https://github.com/htdebeer/pandocomatic/issues/new. C:/Ruby33-x64/lib/ruby/gems/3.3.0/gems/pandocomatic-1.1.3/lib/pandocomatic/pandoc_metadata.rb:238:in `scan': invalid byte sequence in UTF-8 (ArgumentError)
Please help