weecology / MATSS-LDATS

Macroecological LDA analysis of time series
MIT License
3 stars 0 forks source link

Unable to run successfully #20

Closed ethanwhite closed 5 years ago

ethanwhite commented 5 years ago

I installed both MATSS & MATSS-LDATS using devtools. The data installation steps appear to currently be comment out (they are wrapped in an if (FALSE) statement. So I did the installs manually into the analysis/data directory. The current state of that directory is

ethan@oneesk:~/Dropbox/Research/MATSS-LDATS (master)$ ls analysis/data/
breed-bird-survey  DO-NOT-EDIT-ANY-FILES-IN-HERE-BY-HAND  mapped-plant-quads-mt  veg-plots-sdl

and the appropriate csv files installed by the retriever are in each of the dataset named folders.

When I try to run the pipeline.R script this is what I get.

ethan@oneesk:~/Dropbox/Research/MATSS-LDATS (master)$ Rscript analysis/pipeline.R 
Please look at our data formats by running `vignette("data-formats")`

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Attaching package: ‘matssldats’

The following object is masked from ‘package:MATSS’:

    run_LDA

target maizuru_data
target jornada_data
target sgs_data
target portal_data
Target portal_data messages:
  Loading in data version 1.97.0
target bbs_data
target sdl_data
Warning: target sdl_data warnings:
  Didn't find any downloaded data in ~/veg-plots-sdl.
Did you run get_retriever_data() first?
fail sdl_data
Error: Target `sdl_data` failed. Call `diagnose(sdl_data)` for details. Error message:
  no applicable method for 'select_' applied to an object of class "NULL"
Execution halted
Warning message:
system call failed: Cannot allocate memory 
ethanwhite commented 5 years ago

If I remove the SQLite cache file and rerun I get the same end point with a little extra output first:

target maizuru_data
target jornada_data
target sgs_data
target portal_data
Target portal_data messages:
  Loading in data version 1.97.0
target bbs_data
target sdl_data
Warning: target sdl_data warnings:
  Didn't find any downloaded data in ~/veg-plots-sdl.
Did you run get_retriever_data() first?
fail sdl_data
Error: Target `sdl_data` failed. Call `diagnose(sdl_data)` for details. Error message:
  no applicable method for 'select_' applied to an object of class "NULL"
Execution halted
Warning message:
system call failed: Cannot allocate memory 
ha0ye commented 5 years ago

We use an environmental variable in MATSS to control where downloaded datasets go, and that same variable is used when reading them in. (and it defaults to ~ if it isn't set)

I've noted this issue in weecology/MATSS#106. After getting that functionality set up, we should only need to add a config file here to point the pipeline at analysis/data.

ethanwhite commented 5 years ago

Thanks. So if the install_retriever_data calls were actually getting executed and folder_path was being set then everything would have ended up in the right place and worked?

For now it sounds like I need to move the data directories into ~, so I'll go ahead and try that.

ethanwhite commented 5 years ago

OK, so now everything is working locally, but still not working on HiPerGator. Here's what I get on HiPerGator with the data in ~:

(r-reticulate) [ethanwhite@dev1 MATSS-LDATS]$ Rscript analysis/pipeline.R 
...
Error: Failed to make a grid of grouping variables for map().
Grouping variables in map() must have suitable lengths for coercion to a data frame.
Possibly uneven groupings detected in map(fun = list(ts), data = list(maizuru_data, jornada_data, sgs_data, 
    portal_data, bbs_data, sdl_data, mtquad_data), lda = list(
    analysis_lda_maizuru_data, analysis_lda_jornada_data, analysis_lda_sgs_data, 
    analysis_lda_portal_data, analysis_lda_bbs_data, analysis_lda_sdl_data)):
  ts
  c("maizuru_data", "jornada_data", "sgs_data", "portal_data", "bbs_data", "sdl_data", "mtquad_data")
  c("analysis_lda_maizuru_data", "analysis_lda_jornada_data", "analysis_lda_sgs_data", "analysis_lda_portal_data", "analysis_lda_bbs_data", "analysis_lda_sdl_data")
Execution halted

The error happens here:

https://github.com/weecology/MATSS-LDATS/blob/master/analysis/pipeline.R#L36

It looks like for some reason mtquad_data isn't included in lda_targets once it reaches build_ts_analysis_plan which causes the map to fail? But that dataset is in lda_targets while in pipeline.R so I'm confused and hoping this makes senses with a little more knowledge of the codebase.

ha0ye commented 5 years ago

Ah, I updated the package code and the pipeline script a few days ago, so I think you're running the slightly older version of the package with the newer version of the pipeline script.

Can you try re-installing MATSS-LDATS and running again?

ethanwhite commented 5 years ago

That fixed it. Thanks!

We are now successfully running on HiPerGator. Next step is to do a scheduled build.