Creating a package of extra-large cache files

tanho63 commented 3 years ago

Hi Neal,

In #11 you suggested possibly creating a data package of test files that is downloaded during testing. Do you have any suggestions as to best-practices for doing this?

I'm trying to mentally model what that might be and would love thoughts on the following process:

uploading test files to a separately tracked/version-controlled git repository that doesn't go to CRAN (since CRAN data packages also have a 5MB limit)
doing a "skip_on_cran" or "skip_if_unavailable" check for the github hosted data
in a helper/setup file, downloading the data into a tempdir and then setting .mockPaths to the tempdir
running all the tests
on teardown, cleaning up by deleting the tempdir

Is there anything I'm missing from doing it with this approach? (Maybe @maelle has some thoughts as well?)

nealrichardson commented 3 years ago

Hi @tanho63, I'll preface this by saying that I don't think this is a good idea. In my opinion, your test files/fixtures shouldn't be big, and you should edit them down to be a size that's reasonable to include in your package. For the kinds of things you want to be testing with httptest, you don't need a lot of data. If your (real) API request returns a big JSON with 500 records, edit it down to only have 3. If you've recorded a real request that returns a 10mb .png, delete the file and replace it with a single pixel. The goal of your R package tests is to make sure that (1) you can make the right API requests given user input, and (2) your code can handle the API's responses, both 200 OK and 500 not ok. You want the smallest possible test fixtures that let you test your R code; you don't need to test that the API does the right thing with your request, that's not your responsibility.

That said :) if I were do this, I might keep the fixture files in the same git repo as the code but outside the package directory (or otherwise in .Rbuildignore). Then you would have some process that, at build/release time, you zip up the fixture directory and insert it into inst/ or tests/. Then your test code would look for that tarball and decompress it and use its contents if found. This would work assuming text files (which compress well) up to a certain size. I wouldn't go so far as having a separate resource you download to run the tests unless you want to skip all of your tests on CRAN/if there is no network connection, which defeats much of the purpose of writing tests with HTTP mocks.

tanho63 commented 3 years ago

Hi Neal - thanks so much for your thoughts on this!

Philosophically, I think the bulk of the R code from my package is aiming to clean and process a wide variety of response data from multiple/different APIs (moreso than it is about trying to make the correct API requests and/or testing if the API returns 200/500) and I think there are a lot of edge cases nested deeply in bigger JSON files. The API response structure does change a little bit over time, enough to make re-building test files a regular maintenance task that I don't want to become super time-consuming.

I dabbled and ended up structuring it like this https://github.com/dynastyprocess/ffscrapr/blob/dev/tests/testthat/setup.R, referring to an external test repository that is only downloaded when running tests or building vignettes. I think that's similar to what you outlined, but I'd be curious if you notice any concerns that I haven't covered by doing it this way.

nealrichardson commented 3 years ago

Seems reasonable. The downside to having fixtures be in a separate repo vs. in the same repo as the code is that you have to worry about keeping the two repos in sync. But it does save you the trouble of having to zip up the tarball yourself since you get that from GitHub.

Another alternative: keep the mocks in your repo and .Rbuildignore the directory, and have the same helper that skips all the tests if the mocks aren't found. When running from the git checkout locally and on GitHub Actions, you have the directory there and run the tests; on CRAN, they just all skip. That sounds a lot simpler, with the downside that tests are all skipped on CRAN (though some might not view that as a downside).

tanho63 commented 3 years ago

Yeah, I think having CRAN check it is a benefit (since if dependent R code changes it'll re-test more robustly/via-the-cran-ecosystem), rather than relying on my CI/CD to do it.

I think managing dev branches of test files on the same repo (and making it download the test files from the correct branch of the repo for dev vs main) is more daunting than just keeping test files separated - but I may need to think about that one a little more.

Thanks again for your suggestions on this - really appreciate it!

nealrichardson / httptest

Creating a package of extra-large cache files #61