ncss-tech / soilReports

An R package that assists with the setup and operation of a collection of soil data summary, comparison, and evaluation reports. These reports are primarily used by USDA-NRCS soil scientists in both initial and update mapping.
15 stars 5 forks source link

writing output files throws and error when mu symbols contain special characters #75

Closed dylanbeaudette closed 6 years ago

dylanbeaudette commented 6 years ago

This can happen when applying the region2/mu-comparison report to other data such as LRU concepts that have more expressive labels.

TODO:

  1. fail gracefully when trying to write output files (cd599ceb4ad772c3d782c72c418b526af1c57821)
  2. generate safe filenames
  3. add an option for toggling various output files
brownag commented 6 years ago

I had to deal with this once before for a script that generates grouped profile plots for all MUs in the survey. Incidentally, people were using illegal characters e.g. "1031?" in MUSYM field.

Something like this would do the trick. I wanted to make it clear that illegal characters were being used, and also distinguish those from things actually called e.g. "1031", so I made the replacement something other than "". For a file name foo:

gsub(foo, pattern='([/\\|<>:\\*?\"])', replacement="BadChar")

brownag commented 6 years ago

RE: cd599ce

When you wrap the writeOGR/CSV calls in try, but don't catch exceptions, you may not get intended effect. While I've never had the problem raised here in #75 for MUSum report, I do know that I've had issues related to concurrent modification.

If you have the shapefile open in an Arc session, will the script notify you that it was unable to write to file? (did not test this at the time, just a question)

EDIT: quick test. is try() having the intended effect here?

> badname <- "yourfilename?.csv"
> goodname <- "yourfilename.csv"
> yourdata <- c(2,2,2)

Filename with illegal character returns error upon writing:

> write.csv(yourdata,file=badname)
Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
  cannot open file 'yourfilename?.csv': Invalid argument

Without illegal character, it works:

> write.csv(yourdata,file=goodname)

Also works wrapped with try():

> try(write.csv(yourdata,file=goodname))

Filename with illegal character DOESN'T work, even when wrapped with try.

> try(write.csv(yourdata,file=badname))
Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
  cannot open file 'yourfilename?.csv': Invalid argument

Now, for concurrent modification:

> getwd()
[1] "S:/NRCS/Archive_Andrew_Brown/Scripts"

[OPENED FILE S:/NRCS/Archive_Andrew_Brown/Scripts/yourfilename.csv in EXCEL]

When file is open in another program, error occurs even without illegal character. Even when statement is wrapped in try().

> try(write.csv(yourdata,file=goodname))
Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
  cannot open file 'yourfilename.csv': Permission denied

Looking at some of my code from the same case I mentioned in the previous comment, I did something like this.

 if(typeof(try( { expression } )) == "try-error")
      #say something informative
dylanbeaudette commented 6 years ago

Thanks for the testing and examples. I think that the solution might be a combination of documentation and sanity checks that prevent the report from stopping before completion. Over the last couple of years, that has been one of the major sources of confusion: "why didn't the report finish?".

Like this:

brownag commented 6 years ago

My thoughts on that:

  1. even if the knitting completes, if report can't successfully write the output files... does it truly "finish"? without a warning saying that expected output wasn't created, I feel like that is still incomplete. EDIT: I suppose using the chunk option would display the error! and I suppose if it failed only at output step, the contents of the report from the knit session would be complete
  2. A pain for debugging perhaps (toggle). Also, some errors should rightly prevent the report from finishing (read/write in particular). Properly trapping and giving informative error message in these cases where script interacts with users filesystem probably would stave off a lot of problems people have with working directories, file naming etc.
  3. yes. also, as you said, allowing users to specify which outputs they want (amount of files generated assoc with the SHP, plus CSVs, plus report itself... is overwhelming. plus autogenerated filenames are complex). Perhaps it would be better to just require user input for how they want the file named. If they want their group labels in the name, then the onus is on them for creating an appropriate filename. Config file comment for that var could include list of prohibited characters.
brownag commented 6 years ago

also, perhaps a check at the head of the report to ensure that the file names and paths are all valid (before trying to sample rasters etc etc.) might be worthwhile. it's sad to wait for a report to run, only to have I/O fail right as it finishes all the hard work.

dylanbeaudette commented 6 years ago

I like that idea: check at the beginning for things that will stop the report from running 80% of the way through.

brownag commented 6 years ago

d98f41d