Open aslehigh opened 5 years ago
Yes, indeed, this is confusing
Tried to add quotes from the code but this doesnt seem to work :/
Curious how other libs manage this
For the simplest possible solution I would suggest replacing command
in the check_output
function call in the run_command
function (currently line 43 in pypdftk.py) with map(shlex.quote, command)
... and of course add the shlex import at the top.
This would accomplish my suggestion of quoting everything (and address a whole class of potentially security-related bugs in code using this library). It would be a breaking change though for anyone already quoting the arguments in their own code. But I'm guessing maybe not too many people have run into this or you would have heard about it before now........?
Tried to add quotes from the code but this doesnt seem to work :/
Curious how other libs manage this
Possibly using f
strings as follows... filename = f'"{os.path.join(path, "filename with spaces.pdf")}"'
Note, however, that if your f
string starts (and ends) with single quotes, then any strings (incl. key names), inside the curly braces must be in double quotes, and vice versa. Otherwise, you'll raise a SyntaxError
.
If my filename contains one or more spaces (or, I suppose, any character the shell considers special), the program throws an error:
From the last line there I could see what the problem was: the filename is plopped into the command line unquoted. So I can work around the problem by including quote characters in the argument:
I think this really should not be necessary. This behaviour would be confusing to new users, and makes the library more difficult to use.
After skimming through the source code, it looks like a minimal solution could be to simply add quote characters around the user-supplied filename arguments everywhere they are used. Quite likely there are other sorts of arguments that should be quoted on the command line also.
But I have another suggestion: why not just quote everything that goes on the command line? The
run_command()
function could take a list of items or tokens, and simply join them with a space between after adding the quote characters. Furthermore, it appears to me that therun_command()
function is called with the same first argument every time. If this is true, why not include it in the function instead of requiring the caller to pass it? Obviously this would involve a lot more modification to the code, but I think it would simplify it on the whole, and potentially eliminate a whole class of possible bugs/user errors. I guess that would be a breaking change though for existing code that already adds quotes.