jgm / pandocfilters

A python module for writing pandoc filters, with a collection of examples
BSD 3-Clause "New" or "Revised" License
508 stars 115 forks source link

Create temporary directory in a more safe manner. #88

Closed jerri closed 3 years ago

jerri commented 3 years ago

Also make sure the temporary directory gets removed at the end of the script, unless prohibited.

Fixes #87

The code should still be compatible with python2 and python3.

ickc commented 3 years ago

Default to be backward incompatible might not be a good idea. Although temp dir should be temporary and therefore should not be a problem when removed, some could rely this side effect as a cache. Removing this by default could means unnecessary work each time it is invoked.

This is not spoken from experience, if you believe people using this expected the temp to be cleared, then it’d be fine.

jgm commented 3 years ago

That's a good point: I believe many people may want the temp dir to persist. Suppose you're converting tikz diagrams to images, for example. If the temp dir disappears after every pandoc run, then every time you run pandoc the filter will need to regenerate every diagram. With the persisting temp directory, only diagrams that are changed (or new) need be regenerated.

jerri commented 3 years ago

I understand. Problematic situation. I don't have too much experience with a lot of filters. Too be honest I just have experience with a filter for mermaid. The directory created for the conversion of the files caught me by surprise and I immediately thought there was a bug in the filter I used. Right now I added an additional script in all my automations to remove those files after use. What is odd, in my opinion, is that the file is generated in the directory where the script is executed. In the worst case, you might not even have permissions to create the directory and would just get a strange error message, that the directory couldn't be created. If it is expected functionality, that the directory gets created and is used, I guess, this PR doesn't make a lot of sense. On the other hand I wonder, is there really a caching effect? Who is checking if the file already exists and prevents the filter to recreate the file? Is this part of pandocfilter library? Haven't checked right now, I have to admit.

jerri commented 3 years ago

Just checked. So tikz.py itself is doing the caching. Mhm... That would mean one would have to add a boolean to allow for the old behaviour of creating the file in the "predictable" location. Mhm... I understand the reason, but still am somehow unsure if I like the approach. The original name of the directory is not very special and could easily clash with an existing directory, which would be filled up by strange files. And worse, if you are using several filters, you get several directories filled with files that might not be necessary in some cases. I wonder if it would make more sense to add an environment variable that could enable the new behaviour with the real temp directories and could be set by people like me to fully avoid the generation of those directories. In that case it even would be fully transparent for the filters. What do you think?

ickc commented 3 years ago

The world of pandoc filters is fragmented. The easiest way probably is to have this function defaulted to the old behavior, then submit a PR To the filter that has buggy behavior, in this case a mermaid filter and hope they merge it, if abandoned, fork it.

Also, if you have pandoc filter in a location that it has no power to create a directory, it’s probably not correct. Eg you might have accidentally install it using sudo pip. It is often even recommended, but make sure you only sudo install something you trust very much. A better approach is pip install --user or have virtual environments like conda.

jerri commented 3 years ago

Hello ickc: Maybe a misunderstanding regarding the permission topic. The pandocfilter-librarys code creates the directory in the current directory you are in, when you start the filter. So, if you try to use pandoc while you are e.g. in /usr/bin, it will not work, as users usually don't have permission to create files or directories in that environment. This doesn't have anything to do with where the library has been installed. In any case, I will think about the idea with the environment variable, because this probably fixes all the problems while still being completely backwards compatible.

ickc commented 3 years ago

A way to control the behavior is definitely needed, env var is easiest among all options.

if you believe the upstream filter should have cleared the temp, a PR will be useful to benefit other people. But if it is subjective (Eg some want them be persistent as a cache) then of course nothing can be done there.

jerri commented 3 years ago

I implemented the environment variable idea with the env variable PANDOCFILTER_CLEANUP, that a user could e.g. set in his .bashrc. This variable will prevent the creation of the images-directories and will therefore also disable any possible caching.

jerri commented 3 years ago

Sorry, I forgot to add documentation to the README.

ickc commented 3 years ago

There's a minor problem that the error is raised instead of passed. I think it follows the design of pandoc that garbage in, garbage out, but not "error out". So it tries to finish performing the job even if there's an error, and in that case it would be leaving as is. I can change that behavior after merging since that the PR is already some time ago.