tiernanmartin / drakepkg

Demonstrate A Drake Workflow Package
Other
29 stars 5 forks source link

Cannot access files in inst/ #1

Open tiernanmartin opened 6 years ago

tiernanmartin commented 6 years ago

While working within the drakepkg directory itself, it would be nice if I could make() plans that include filepaths in the inst/ directory.

That won't work because when a user installs the package, the sub-directories of inst/ are moved to the package root directory which breaks the filepaths included in the plan.

tiernanmartin commented 6 years ago

My current solution to this problem includes the following steps:

  1. Move all inst/ subdirectories to the package's root directory
  2. Add these directories to both .gitignore and .Rbuildignore
  3. Create directory junctions (I work on a Windows machine) for each of these directories which link to inst/

Now, I can run make(plan_example) from within the package and it can find all of the filepaths included in the plan.

I doubt this is considered a best practice, so I'll keep this issue open until I find a more elegant solution.

wlandau commented 6 years ago

What about a step in the plan that makes the files available?

plan <- drake_plan(
  get_files = target(
    command = {
      file_out("exdata/other-iris.xlsx")
      copy_pkg_files()
    },
    trigger = trigger(change = packageVersion("drakepkg"))
  )
)
wlandau commented 6 years ago

Or maybe you could just return the formally packaged versions of the datasets.

plan <- drake_plan(
  get_iris_data = target(
    command = {
      data(otheriris)
      otheriris
    },
    trigger = trigger(change = packageVersion("drakepkg"))
  )
)
tiernanmartin commented 6 years ago

I like the way trigger() is used in your suggestions. I definitely would not have thought to connect packageVersion() to a trigger!

tiernanmartin commented 6 years ago

I think it would be best for most use cases if a workflow's data are included as formal datasets in the package -- it's very intuitive. The USGS groundwater model package that I mentioned before is a good illustration of this approach.

Unfortunately, in my particular use case (GIS-type projects) I will run into file size limitation issues with this approach. A Github-hosted package with a bunch of ~5 GB .rda files isn't going to work very well.

Instead, I would like to host the files outside the package (right now I'm experimenting with osf.io at Ben Marwick's suggestion) and write plans that allow the package user to download the files into their working directory. The directory junctions allow the same plan to work regardless of whether it's run in the directory where I'm developing drakepkg or in the working directory of someone who is using the package.

So since my use case is likely to be more rare (and not good for a minimal example), I'll revise the package to include otheriris as a formal dataset and I can make a note somewhere about my external data workaround.

wlandau commented 6 years ago

osf.io and osfr sound great! There is a lot to explore here, and I think OSF deserves attention in fully-implemented examples. Would you be open to having a plan that uses osfr::download_files() to get data and triggers based on osfr::get_files_info()?