knausb / vcfR

Tools to work with variant call format files
248 stars 54 forks source link

read.vcfR fails on offline files on Windows #109

Closed NikNakk closed 6 years ago

NikNakk commented 6 years ago

The call to file.access within read.vcfR returns -1 when checking if the VCF file can be read when on Windows and the VCF file is cached as an offline file. Removing this check from read.vcfR allows the read to go ahead. In view of the recommendation within ?file.access 'Please note that it is not a good idea to use this function to test before trying to open a file,' albeit for different reasons, is it necessary to keep this check for readability there? I've not seen other R functions that read files do this.

Note the file existence check (file.access with mode = 0) is fine, though still not recommended in ?file.access

knausb commented 6 years ago

Hi NikNakk, thanks for reporting this. If I remember correctly these tests were added after a user requested a more 'user friendly' behaviour. So I feel that having some sort of check is appropriate. The man page for file.access() does suggest against using it but its recommendation to use try would not be a straightforward test to see that the file exists but is not readable. Do you have a suggestion on how to handle this?

I do not fully understand what you mean by 'cached as an offline file.' If you have the file on a local filesystem it should read in fine. Do you have it as some sort of temporary file? Please clarify. Better yet, please provide a minimal reproducible example as I've attempted to explain here.

Thanks! Brian

NikNakk commented 6 years ago

Hi Brian,

I've tracked down the issue to the implementation of file.access on Windows. I've suggested a fix to r-devel, but not sure if they'll go with it since it's probably a relatively uncommon issue.

This is how to reproduce the file.access issue - hope this makes it clearer as to where the problem actually is:

Steps to reproduce:

  1. Ensure Offline Files is turned on within Windows
  2. Using Windows Explorer, browse to a folder shared on a network using a UNC path, e.g. \mypc\myshare\
  3. Create a test file, e.g. test.txt
  4. Within R, try the following:
file.access("//mypc/myshare/test.txt",0)
# Returns 0
file.access("//mypc/myshare/test.txt",4)
# Returns -1 if the share is on a non-Windows host, 0 if it is on a Windows host
  1. Right click on the file within Windows Explorer and ensure 'always available offline' is checked.
  2. Wait for the sync to take place.
  3. Disconnect from the network.
  4. Within R, try the same commands again
file.access("//mypc/myshare/test.txt",0)
# Returns 0
file.access("//mypc/myshare/test.txt",4)
# Returns -1 regardless of whether the original host was Windows or non-Windows

In my email discussion with the developer, they again recommended against file.access.

The main issue here in read.vcfR seems to be that the code used to read the vcf didn't throw an error if gzopen failed, but instead printed an error and then returned an empty stats object. Have a look at my fork. I've replaced the printing of an error and returning of empty stats, to a call to Rcpp::stop. For a file that has no read permissions, the error (on Windows at least) is 'Permission denied'. For a non-existent file it is 'No such file or directory'. Both seem ok, though if you wanted you could wrap with try or tryCatch.

Anyway, hopefully the underlying bug in file.access will be fixed in a future release of R.

knausb commented 6 years ago

Ah, I did not realize that 'offline file' was a concept. To try to add a little clarity to this thread, an 'offline file' is a Windows concept and appears to be a method to make files that are typically housed on a network drive available when the computer is removed from the network.

And I was not aware of Rcpp::stop() either. Using this will halt the process when it encounters this command.

@NikNakk my threshold for becoming a contributor is whether they've taken the time to write some code and make a PR. You've written some code, and introduced me to some new things. If you would like to make a PR you can also add your name to the DESCRIPTION file as well. Thanks!