Closed oganm closed 6 years ago
Yes. This would be a useful addition.
I was working on this while trying to preserve your style not to disrupt anything but in one case I wasn't sure how conservative I should be.
Here for instance you have a command that creates a single file yet you ask for the file path and file name separately instead of just accepting the file path as a single input. Do you want this pattern to remain for all file outputs?
If one needs to provide a specific name for the generated file (trimmed pdf file) 'output_filename' argument allows it to do so. If you have a better suggestion I am happy to change the format.
Personally I don't see much benefit in having a default filename with a file path that has to be provided (as a window will open asking for it if it is null). It seems more intuitive to me to directly input a filename. A reasonable default could be something like
output_filepath = paste0(tools::file_path_sans_ext(input_filepath), "_trimmed.pdf")
So by default it'll just save it next to the input without asking for anything. If you want to change name or the path, you just write the whole thing, or set it to NULL to open the save file dialog window
Thanks for the explanation and I also prefer that way.
But according to CRAN policies our functions should not write by default in the user's home filespace. That is not allow by CRAN policies. We can only write/save files if the user has specified a directory.
I like your suggestion to open a save file dialog window.
Got stuck while dealing with regular expressions due to some weird encoding issues of the FDF output. Was trying to deal with this without expanding the dependency tree but couldn't find a workaround using base R yet. Would you mind if I added stringi
or stringr
as a dependency?
Yes that's totally fine.
Thanks Oganm. This is a great addition to the package. I'll update the readme file soon
Do you also want to change how the output_directory
/output_filename
combinations work in your other functions?
Since the package is already published it might be problematic to change the way those work depending on how popular it already is but you can introduce a output_filepath
and throw a deprecation warning when people use output_directory/filename combination. Or keep output_filename
and just make the output_directory
an optional variable that does nothing if NULL.
If you choose to keep output_filename
and make output_directory
optional, the default value for output_filename
needs to be undefined to fit CRAN guidelines which will cause backwards incompatibility for people who have been using the default value. Also for consistency output_filepath
used in fill_pdf
should probably change. This keeps the old input structure intact but not fully backwards compatible as it makes output_filename a mandatory argument.
If you choose to deprecate, the default value for output_filename
should remain and output_directory
should still default to NULL. Both of these values should only be considered if output_filepath
is not provided and output_directory is not NULL. If output_filepath
is provided, these values are ignored. If output_filepath is NULL & output_directory is NULL, the window that opens should prompt a file name instead of a directory name. The advantage of this is it will be fully backwards compatible. Disadvantage is you'll now have a slightly more confusing documentation and deprecation warnings for old users.
If you choose to do nothing... well it's slightly awkward to use but not the end of the world.
If you want to go one way or another I can look into it on a spare time or leave it to you.
Hi Oganm thank you very much for this suggestion. One of the features that I would like to maintain is to allow the users to select the files and directories interactively. I would like to go with the second option. What to you recommend? I am planing to add few more functions pleas let me know your preference too. We can follow the same structure.
"Files and directories" or "files or directories"? Option 2 which would have been my choice as well does "files or directories" as it allows setting the directory with GUI with a default filename or setting the file without any defaults though I am unsure if it makes sense to preserve that structure for the new functions (as in a new function with both output_filepath
and output_directory
/output_filename
combo as the input, the existing functions will have this combination but new functions will be simpler if they only had output_filepath
)
In use cases I can think of, it'll be redundant to enable people to use the GUI for either file path or the file itself. I think picking the directory only makes sense if the input (or output) itself is a directory, as in staple_pdf
as a side note, it'll probably be helpful if staple_pdf
allowed individual filenames as an input since a directory does not allow control over how the files are ordered and it forces you to jump through some hoops if you want to merge files from arbitrary locations in a given order.
sorry for the unsolicited backlog. I'll try to make future recommendations in the form of branches ready to be merged. just didn't want to work on structural changes without consent
Thank you very much for the suggestion. Yes staple_pdf should allow individual filenames as input. I'll work on this. I edit functions giving output_filepath as the input. Need to add warning messages for output_directory/filename combination in the previous functions. I am happy to go with your suggestion of using output_filepath as the only input.
Fill pdf currently doesn't support "\"s added to text fields as they also act as regex escape characters. Need to escape them elegantly
Were you planning to wrap the form filling functions of pdftk (
generate_fdf
andfill_form
commands)? I could work on those on a spare time but was wondering if you had any ideas on how they should work.generate_fdf
command creates a plain text file with field names of a pdf form. Given an fdf file and a pdf,fill_form
command fills the pdf.A simple interface would be writing a
getFields
function that'll executegenerate_fdf
, parse that file and return a list of fields and their current vaules. Then asetFields
function will take that list as an input and write it back to the file.