jromanowska / HaplinMethyl

Add-on package to manage environmental data and use it with Haplin
https://jromanowska.github.io/HaplinMethyl/
MIT License
0 stars 0 forks source link

"'cont' variable not found" warning always appears when using envDataLoad #11

Closed ellisifnygaard closed 1 year ago

ellisifnygaard commented 1 year ago

The following warning always appears when I use envDataLoad:

Warning message:
In envDataLoad(ex_out_file) :
  Problem with the loaded data: 'cont' variable not found,
                 assuming that the data is continuous

It does not seem to matter whether the fileset being loaded was just exported using envDataRead(), envDataReadFromObj() or envDataSubset() using the newest version of the package. Even though the data appears to be classified as 'cont', the warning still shows up when loading the data:

devtools::load_all(".")

# Read exemplary data file:
ex_path <- system.file("extdata", package = "HaplinMethyl")
ex_file <- "env_data_test.dat"
ex_out_file <- "dnam_ex"

dnam_ex <- envDataRead(
  file.in = ex_file,
  dir.in = ex_path,
  file.out = ex_out_file,
  sep = " ", # the exemplary file is a space-delimited file
  header = TRUE, # make sure to check this!
  rownames = TRUE, # make sure to check this!
  overwrite = TRUE
)

# dnam_ex appears to be classified as continuous/'cont':
summary(dnam_ex)
class(dnam_ex)
# > summary(dnam_ex)
# List of 5
# $ class   : chr [1:2] "env.cont" "env.data"
# $ nrow    : int 200
# $ ncol    : int 400
# $ rownames: chr [1:200] "id1" "id2" "id3" "id4" ...
# $ colnames: chr [1:400] "cg1" "cg2" "cg3" "cg4" ...
# > class(dnam_ex)
# [1] "env.cont" "env.data"

# Let's load the saved data that was exported during envDataRead:

loaded_dnam_ex <- envDataLoad(ex_out_file)
# > loaded_dnam_ex <- envDataLoad(ex_out_file)
# Warning message:
#   In envDataLoad(ex_out_file) :
#   Problem with the loaded data: 'cont' variable not found,
# assuming that the data is continuous

class(loaded_dnam_ex)
# > class(loaded_dnam_ex)
# [1] "env.cont" "env.data"

I think the reason for this is that the cont variable in envDataLoad.R is set to NULL on line 26 and ff::fload() on line 27 subsequently does not overwrite cont to reflect the data being loaded.

Possible fix: add overwrite = TRUE argument in ff::ffload()

Illustration (running code from envDataLoad.R):


# Now, let's test the code from the envDataRead function:

# Define necessary variables:
filename <- ex_out_file
dir.in <- getwd()

# Actual code from envDataRead.R: 
file.in.ff <- paste( dir.in, "/", filename, "_env.ffData", sep = "" )
if( !file.exists( file.in.ff ) ){
  stop( "The file(s) doesn't seem to exist!", call. = FALSE )
}

env.cols.name <- get( ".env.cols.name", envir = .haplinMethEnv )
file.in.base <- paste( dir.in, "/", filename, "_env", sep = "" )
cont <- NULL

# Run ff::ffload without supressWarnings():
# suppressWarnings( ff::ffload( file.in.base
#                               , rootpath = getOption( "fftempdir" )
#                               ) )
ff::ffload( file.in.base, rootpath = getOption( "fftempdir" ) )
# > ff::ffload( file.in.base, rootpath = getOption( "fftempdir" ) )
# character(0)
# Warning messages:
#   1: In FUN(X[[i]], ...) : did not overwrite object 'cont'
# 2: In FUN(X[[i]], ...) :
#   NOTE: did not overwrite file '***/***/AppData/Local/Temp/RtmpmQ5vF5/ff/ff7484723a42e0.ff'

# (I added the "***" in the path for security reasons ☺️  )

# The 'cont' variable is still NULL after using ffload!
cont
# > cont
# NULL

# Let's try ffload again, but add overwrite = TRUE:
ff::ffload( file.in.base, rootpath = getOption( "fftempdir" ), overwrite = TRUE )
# > ff::ffload( file.in.base, rootpath = getOption( "fftempdir" ), overwrite = TRUE )
# [1] "Users/edj079/AppData/Local/Temp/RtmpmQ5vF5/ff/ff7484723a42e0.ff"

# The 'cont' variable has been updated:
cont
# > cont
# [1] TRUE

P.S

ff::ffload() without overwrite = TRUE is also used in genDataLoad.R in the current version of haplin : https://github.com/cran/Haplin/blob/93c3e77ef3bb74b688c0322d3974c46862a2b5e4/R/genDataLoad.R#L31

I don't know whether there might be similar bugs when using genDataLoad, but it could be worth looking into πŸ•΅οΈβ€β™€οΈ

jromanowska commented 1 year ago

Oh, great that you tried that! I was also wondering why I get this error... I haven't had this one when using genDataLoad... Will test to be sure.

ellisifnygaard commented 1 year ago

Oh, great that you tried that! I was also wondering why I get this error... I haven't had this one when using genDataLoad... Will test to be sure.

I can create a new branch and add overwrite = TRUE to ffload() in envDataLoad.R πŸ˜ƒ Unless you think that overwrite = TRUE might cause trouble in some cases?

jromanowska commented 1 year ago

I think it's fine - this overwriting is what we want when running envDataLoad. :+1:

ellisifnygaard commented 1 year ago

Result of devtools::test() on the bug-fix-issue-11 branch:

══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 13.8 s

[ FAIL 0 | WARN 3 | SKIP 0 | PASS 66 ]

Note that these tests do not include the changes made to test_2_envDataSubset.R on the bug-fix-issue-10 branch, but that should not be an issue.

@jromanowska I will create a pull request, but feel free to do some improvised ad hoc and testthat tests of the changes before you review the pull request 😎