jpshanno / ingestr

ingestr: An R package for reading environmental data from raw formats into dataframes.
https://jpshanno.github.io/ingestr/
MIT License
19 stars 3 forks source link

ingest_directory has several issues #38

Closed twhiteaker closed 4 years ago

twhiteaker commented 4 years ago

I'm an R noob, so I may not have some of this right!

1. Pass in arguments, not strings

Arguments to all_character and all_logical should have quotes removed. Otherwise, you're just passing in strings. Revised:

  all_character(c(check.duplicates, directory, pattern))

  all_logical(c(collapse, use.parallel, recursive))

2. Use full paths for files

You may want to return full paths to files when getting the file list. Otherwise, you may get file-not-found errors when trying to open the files. Revised:

  file_list <- list.files(directory,
                          pattern = pattern,
                          full.names = TRUE,
                          recursive = recursive)

WARNING: Ingestr uses filenames as unique keys for each dataset, so the solution may not be as simple as returning full paths.

3. Check for empty lists

Check that the list isn't empty before grepping, as in

  if(length(list(...)) > 0 && grepl("header.info.name", names(list(...)))){

4. Create directory just for ingest_directory unit tests

The test-ingest_directory.R script is supplying the example_data directory. However, there are a lot of files in there, and they are unorganized. I think you should create a subdirectory with files to test specifically for the ingest_directory function. This prevents a purposefully quirky file (e.g., a file with an error-causing part to it, for testing some other error function) from messing up unit tests.

I was aiming to make these edits and submit a pull request. However, I don't know if the edits jive with the Ingestr philosophy (e.g., switching to full paths for filenames), and even after these edits, the unit tests were still failing and I need to step away from this now. Hopefully what I've posted here is useful.

jpshanno commented 4 years ago

I think you nailed all of the problems I ended up finding. Should have read this issue first! It was still laid out to use the original return-two-dataframes approach. Rachael and I took a look at it this afternoon. These should all have been fixed with my last commit. Rachael's writing unit tests now, so we'll see if anything else comes up.