pbs-assess / gfsynopsis

:fish: A reproducible data synopsis report for over 100 species of British Columbia groundfish
14 stars 1 forks source link

Can't build document #31

Closed andrew-edwards closed 6 years ago

andrew-edwards commented 6 years ago

May as well use this to keep track of stumbling blocks.

Running the

system.time({
  for (i in seq_along(spp$species_common_name)) {
  fig_check <- paste0(file.path("report", "figure-pages"), "/",
    gfsynopsis:::clean_name(spp$species_common_name[i]))
  fig_check1 <- paste0(fig_check, "-1.png")
  fig_check2 <- paste0(fig_check, "-2.png")
  .....

loop of make.R and get the following:

Building figure pages for north pacific spiny dogfish 
Determining qualified fleet for area 3[CD]+|5[ABCDE]+.
Fitting standardization model for area 3[CD]+|5[ABCDE]+.
Fitting CPUE model ...
Getting sdreport ...
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning messages:
1: No data available. 
2: No data available. 
3: In if (!is.na(sc[[1]])) sc <- sc %>% filter(year >= 2003) :
  the condition has length > 1 and only the first element will be used
4: In if (!is.na(sb)) { :
  the condition has length > 1 and only the first element will be used
5: In gzfile(file, mode) :
  cannot open compressed file 'report/cpue-cache/north-pacific-spiny-dogfish-3[CD]+|5[ABCDE]+-model.rds', probable reason 'Invalid argument'
Timing stopped at: 305.7 9.53 316.1
> 

My report\cpue-cache\ folder is there but is empty (related to the fifth warning).

seananderson commented 6 years ago

Maybe that is not a valid file name on Windows?

Can you do the following?

d <- data.frame(a = 1)
saveRDS(d, file = "north-pacific-spiny-dogfish-3[CD]+|5[ABCDE]+-model.rds")
andrew-edwards commented 6 years ago

I think you're correct:

> saveRDS(d, file = "north-pacific-spiny-dogfish-3[CD]+|5[ABCDE]+-model.rds")
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning message:
In gzfile(file, mode) :
  cannot open compressed file 'north-pacific-spiny-dogfish-3[CD]+|5[ABCDE]+-model.rds', probable reason 'Invalid argument'
> 

Just from emacs I can save a filename with [ ] or + or - in it, but not the | , e.g. dummy|3.txt doesn't work.

seananderson commented 6 years ago

Should work now then

andrew-edwards commented 6 years ago

Gets further, but can't find tidy_survey_sets(), which I can't see in either repository.

Building figure pages for north pacific spiny dogfish 
Determining qualified fleet for area 3[CD]+|5[ABCDE]+.
Fitting standardization model for area 3[CD]+|5[ABCDE]+.
Fitting CPUE model ...
Getting sdreport ...
Determining qualified fleet for area 5[CDE]+.
Fitting standardization model for area 5[CDE]+.
Fitting CPUE model ...
Getting sdreport ...
Determining qualified fleet for area 5[AB]+.
Fitting standardization model for area 5[AB]+.
Fitting CPUE model ...
Getting sdreport ...
Determining qualified fleet for area 3[CD]+.
Fitting standardization model for area 3[CD]+.
Fitting CPUE model ...
Getting sdreport ...
Fitting model for the survey SYN QCS
Error in tidy_survey_sets(.dat, surv, years = years, density_column = "density_kgpm2") (from maps.R#59) : 
  could not find function "tidy_survey_sets"
seananderson commented 6 years ago

It was an internal function in gfplot. I just exported it: https://github.com/pbs-assess/gfplot/commit/a04be68de04c5faed226043f67a103ade3f9c711

I wouldn't have noticed that because of the load_all("../gfplot") at the top of make.R, which makes all functions, including internal ones, available.

andrew-edwards commented 6 years ago

Thanks. The load_all hadn't worked for me:

devtools::load_all("../gfplot/")
Error: Can't find 'c:\andy18\github\gfplot\'.

so I just did library(gfplot), forgetting that that caused an issue.

Okay, so

devtools::load_all("../gfplot/.")

works (with the "." at the end). Will that be okay on a Mac??

seananderson commented 6 years ago

devtools::load_all("../gfplot/.") or devtools::load_all("../gfplot") also will work on a Mac.

Regardless, it will have to work with properly exported functions and a library() call longterm anyways.

andrew-edwards commented 6 years ago

Thanks. devtools::load_all("../gfplot") works so I'll change the makefile to that.

It now gives error:

Building figure pages for north pacific spiny dogfish 
Fitting model for the survey SYN QCS
Interpolating depth to fill in missing data if needed...
Preloading interpolated depth for prediction grid...
Predicting density onto grid...
INLA max_edge = c(20, 100)
Error in inla.models() : could not find function "inla.models"

But in gfsynopsis/NAMESPACE you have

importFrom(INLA,inla.models)

We're getting there, incrementally...

seananderson commented 6 years ago

INLA does some weird stuff because I never actually use that function but I remember having to import it. INLA also isn't on CRAN. INLA should only need to be imported in gfplot. I'll remove that gfsynopsis one.

  1. I assume you have INLA installed? (I'm not 100% sure the DESCRIPTION file for gfplot is set up correctly to install the non-CRAN packages) http://www.r-inla.org/download

  2. If you do have INLA, then try library(INLA) first. I don't have to do that, but I don't understand why inla.models has to be imported at all and this might have different behaviour on Windows. If that works, then I can dig into the NAMESPACE later. Might have to import the entire name space.

  3. If none of that works, perhaps I have a development version of INLA installed. I can't remember but this will be good to figure out if needed. I have 18.04.16.

seananderson commented 6 years ago

Longterm the other option is to revert back to glmmfields now that it's on CRAN. The code for that to work is already baked into gfplot. glmmfields can be quite a bit slower for the coast-wide models though... but at least I understand what's going on under the hood.

andrew-edwards commented 6 years ago
  1. Yes, have INLA as a package (version INLA_17.06.20 built 2017-06-20), but it hadn't got installed by gfsynopsis.

It's got further along and has fitted the SYN QCS survey and moved onto the next ones...

andrew-edwards commented 6 years ago

It's built the figure pages for Spiny Dogfish!

Just re-opening this to make sure I properly incorporate the above fixes (currently have two branches so will wait), and I haven't actually got the document to build yet. Thanks for your help. I'm only around for a couple of days next week so might not get much further anyway.

seananderson commented 6 years ago

OK, great. If you got the figure pages to build for one stock then I imagine it will work for all of them.

I just fixed a whole bunch of little things to pass R CMD check. I also added a section on building the document:

https://github.com/pbs-assess/gfsynopsis/issues

Feel free to push to the master branch in the end.

seananderson commented 6 years ago

And I just edited the instructions to make sure INLA is installed. I also added library(INLA) to the top of the main script.

Obviously the link was wrong in the last comment

https://github.com/pbs-assess/gfsynopsis#building-the-document

andrew-edwards commented 6 years ago

Thanks. I also had to install the packages clisymbols and splancs. I can add those at some point.

Yes, I'm presuming the figures will build for all. Hopefully I can figure out any document-building issues from the knitr files. I could build up to the start of Section 2 the other day (I just added an \end{document} at the end of the created .tex to make the latex run okay.

seananderson commented 6 years ago

Ah, yes, when I wrap make.R in a function in the package it should import clisymbols and therefore install it when the package is installed. I have no idea what splancs is for, but INLA imports it, so I imagine installing INLA as they now suggest:

install.packages("INLA", repos = c(getOption("repos"), 
  INLA = "https://inla.r-inla-download.org/R/stable"), dep = TRUE)

i.e., with dependencies = TRUE should also install things like splancs.

andrew-edwards commented 6 years ago

Damn, it just stopped on Pacific Cod (the fourth species), having done some of the areas:

Determining qualified fleet for area 3[CD]+.
Fitting standardization model for area 3[CD]+.
Fitting CPUE model ...
Getting sdreport ...
In file included from c:/Pfiles/R/R-3.4.3/library/BH/include/boost/config.hpp:39:0,
                 from c:/Pfiles/R/R-3.4.3/library/BH/include/boost/math/tools/config.hpp:13,
                 from c:/Pfiles/R/R-3.4.3/library/StanHeaders/include/stan/math/rev/core/var.hpp:7,
                 from c:/Pfiles/R/R-3.4.3/library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5,
                 from c:/Pfiles/R/R-3.4.3/library/StanHeaders/include/stan/math/rev/core.hpp:12,
                 from c:/Pfiles/R/R-3.4.3/library/StanHeaders/include/stan/math/rev/mat.hpp:4,
                 from c:/Pfiles/R/R-3.4.3/library/StanHeaders/include/stan/math.hpp:4,
                 from c:/Pfiles/R/R-3.4.3/library/StanHeaders/include/src/stan/model/model_header.hpp:4,
                 from file2980313c648f.cpp:8:
c:/Pfiles/R/R-3.4.3/library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined
 #  define BOOST_NO_CXX11_RVALUE_REFERENCES
 ^
<command-line>:0:0: note: this is the location of the previous definition
Error in cpp_object_initializer(.self, .refClassDef, ...) : 
  could not find function "cpp_object_initializer"
In addition: There were 37 warnings (use warnings() to see them)
Error in if (m$return_code != 0L) { (from growth.R#145) : argument is of length zero
Timing stopped at: 2213 164.2 3417
> 

I'm off shortly. But can (next week) try and build the document with the species that it built figures for. Presumably that should work....

seananderson commented 6 years ago

Ah, run library(Rcpp) or library(rstan) first (either should work). Another esoteric namespace thing I hope to fix eventually.

seananderson commented 6 years ago

Link to the issue: https://github.com/stan-dev/rstan/issues/353.

We need to depend on Rcpp, unfortunately, in gfplot.

andrew-edwards commented 6 years ago

Gets to POP then "inla.exe has stopped working" window pops up..... (repeatedly, because it keeps trying again).

andrew-edwards commented 6 years ago

Have reproduced the error for POP. It's on

Fitting model for the survey SYN HS
Interpolating depth to fill in missing data if needed...

To fix can try reducing number of knots.

seananderson commented 6 years ago

@andrew-edwards , when you get a chance, could you try changing:

max_edge = c(20, 100)

to

max_edge = c(30, 100)

on the line: https://github.com/pbs-assess/gfsynopsis/blob/b271148cb00c587483cc23c555f1ba6c6cc8271a/R/make-pages.R#L444

and re-running POP? If 30 fails, you could try a bigger number again.

andrew-edwards commented 6 years ago

Looks good. It's got through POP (all surveys) and is now onto Redbanded....

andrew-edwards commented 6 years ago

Got to Silvergray Rockfish:

...
Determining qualified fleet for area 5[AB]+.
Fitting standardization model for area 5[AB]+.
Fitting CPUE model ...
Getting sdreport ...
Determining qualified fleet for area 3[CD]+.
Fitting standardization model for area 3[CD]+.
Fitting CPUE model ...
Getting sdreport ...
Rejecting initial value:
  Error evaluating the log probability at the initial value.
Exception: normal_lpdf: Random variable is nan, but must not be nan!  (in 'model2980281267c9_vb' at line 17)

Error in sampler$call_sampler(c(args, dotlist)) : Initialization failed.
In addition: There were 46 warnings (use warnings() to see them)
Timing stopped at: 1816 125.7 2508

So an initial value doesn't get properly specified.

andrew-edwards commented 6 years ago

Didn't get much further:

x Building figure pages for silvergray rockfish 
Error in .f(.x[[i]], ...) : object 'sex' not found
seananderson commented 6 years ago

Fixed now. Was caused by some editing yesterday of the maturity .csv file: https://github.com/pbs-assess/gfplot/commit/8bb92423ddd7d4d85869e4781225ea3debfcc524 We'll figure out what's going on there. The 'sex' column was removed.

I promise, silvergray should finish now!

andrew-edwards commented 6 years ago

Yes, Silvergray works. It does give a message (not Warning/Error):

geom_path: Each group consists of only one observation. Do you need to adjust
the group aesthetic?

which I think is what opened up an (empty) R graphics window (forget if that happened before, so maybe nothing to worry about). It's onto Copper...

andrew-edwards commented 6 years ago

It did Copper and the next ten (!), crashing at Longspine:

Fitting standardization model for area 3[CD]+.
Fitting CPUE model ...
Getting sdreport ...
Initial log joint probability = 224.007
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance
Initial log joint probability = 253.162
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance
Error in filter(dat$cpue_spatial_ll, year >= 2008) %>% plot_cpue_spatial(bin_width = 7,  : 
  non-numeric argument to binary operator

Looks like the argument is empty:

>  dat$cpue_spatial_ll
# A tibble: 0 x 15
# ... with 15 variables: year <int>, fishery_sector <chr>,
#   vessel_registration_number <int>, gear <chr>, trip_id <int>,
#   fishing_event_id <int>, lat <dbl>, lon <dbl>, species_code <chr>,
#   species_scientific_name <chr>, species_common_name <chr>,
#   landed_round_kg <dbl>, cpue <int>, total_released_pcs <int>,
#   major_stat_area_code <chr>

I likely won't look at again until Monday...

andrew-edwards commented 6 years ago

Have updated and rebuilt gfplot and gfsynopsis and upgraded ggplot2 to latest version (else couldn't find function ggplot). Get to error:

v Figure pages for shortspine thornyhead already exist
x Building figure pages for longspine thornyhead 
Error in `$<-.data.frame`(`*tmp*`, area, value = character(0)) : 
  replacement has 0 rows, data has 23743

Again, the following doesn't exist (see above comment):

dat$cpue_spatial_ll
# A tibble: 0 x 15
# ... with 15 variables: year <int>, fishery_sector <chr>,
# ...

Trying updating Matrix package to 1.2.14 as TMB uses that (and gives a message to say to please upgrade), but still get the above error.

Also get these warnings:

In addition: Warning messages:
1: No data available. 
2: No data available. 
3: In if (!is.na(sc[[1]])) sc <- sc %>% filter(year >= 2003) :
  the condition has length > 1 and only the first element will be used
4: In if (!is.na(sb)) { :
  the condition has length > 1 and only the first element will be used
5: In if (!is.na(cpue_index[[1]])) { :
  the condition has length > 1 and only the first element will be used
6: Unknown or uninitialised column: 'major_stat_area_name'. 
andrew-edwards commented 6 years ago

Commit c685eb794 fixed the above problem and longspine worked. Now it's onto Sablefish (spewing out lots of numbers on the screen such as:

outer mgc:  2.889603 
outer mgc:  4.28579 
outer mgc:  6.035571 

(don't recall those for other species, but it's been a few weeks).....

seananderson commented 6 years ago

I stopped suppressing the TMB verbose output (outer maximum gradient criteria) temporarily, that's all.

You'll be fitting the delta-Gamma CPUE GLMs right now, which seem to have some funny performance.

andrew-edwards commented 6 years ago

The calculations for all the species finished last night. Woo-hoo. The rest of the make.R runs okay also.

Have run the knitr code -- knit("pbs-gf-synopsis.Rnw") --. It gets to 32% and gives the following (to the console):

  |.....................                                            |  32%
label: survey-index (with options) 
List of 6
 $ message  : logi FALSE
 $ warning  : logi FALSE
 $ fig.asp  : num 0.9
 $ fig.width: num 6
 $ out.width: chr "5in"
 $ fig.cap  : language paste0("Example relative biomass index trends from trawl and longline surveys for ",      sp, ". Dots represent m| __truncated__ ...

geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?

% A fake square to use to get the aspect ratio of the maps correct:

In this section we provide complete figure captions for each of the
visualizations that form the species-by-species synopsis report in
Section~\ref{sec:plot-pages} We use Petrale Sole as an example species.
...

When make.R was run the above geom_path questions were output to the console, and it looks like they've become part of the figure caption. I know you said the build might not work anyway, so no worries if it's not fixable soon. I can probably get on with the IPHC stuff, and should be able to figure out how to put it into your structure.

seananderson commented 6 years ago

You can safely ignore those geom_path warnings.

Does it give some error after that output? The output you pasted above just has normal output as far as I can tell.

You should be able to add the IPHC stuff in gfplot independently of building the synopsis report. If you can write a function that outputs a data frame formatted similarly to what you get out of gfplot::tidy_survey_index() or gfplot::get_survey_index() (whichever is easier) I can help you integrate it fairly easily.

andrew-edwards commented 6 years ago

The geom_path questions go to the console when make.R is run (which is fine), but it looks like they kind of get absorbed into the figure captions (and mess up the captions)?

Yes, there are more errors. The full output after what I pasted above (and removing lots of whitespace) is below. Actually, I think it's really just the one error - I have no pbs-survey-index.rds file in data-cache -- just realised that's likely related to the first error above, which is labelling the chunk/figure as survey-index. So fixing pbs-survey-index.rds may do it -- I have a file called cpue-index-dat.rds which is the only one without a species name.

% ---------------------------------------------------------------------

\subsection{RELATIVE BIOMASS INDEX TRENDS FROM SURVEYS}

\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}

{\ttfamily\noindent\color{warningcolor}{\#> Warning in gzfile(file, "{}rb"{}): cannot open compressed file '../../data-cache/pbs-survey-index.rds', probable reason 'No such file or directory'}}

{\ttfamily\noindent\bfseries\color{errorcolor}{\#> Error in gzfile(file, "{}rb"{}): cannot open the connection}}

{\ttfamily\noindent\bfseries\color{errorcolor}{\#> Error in semi\_join(dat\_survey\_index, survs, by = "{}survey\_abbrev"{}): object 'dat\_survey\_index' not found}}

{\ttfamily\noindent\bfseries\color{errorcolor}{\#> Error in dots\_values(...): object 'survey\_descriptions' not found}}

{\ttfamily\noindent\bfseries\color{errorcolor}{\#> Error in inner\_join(survey\_descriptions, survs, by = "{}survey\_abbrev"{}): object 'survey\_descriptions' not found}}

{\ttfamily\noindent\bfseries\color{errorcolor}{\#> Error in gsub("{} \$"{}, "{}"{}, x\$survey\_series\_desc): object 'x' not found}}

{\ttfamily\noindent\bfseries\color{errorcolor}{\#> Error in paste0(x\$survey\_abbrev, "{} = "{}, x\$survey\_series\_desc, collapse = "{}, "{}): object 'x' not found}}\end{kframe}
\end{knitrout}

Quitting from lines 57-58 (./doc/plot-descriptions.Rnw) 
Quitting from lines 25-32 (./doc/plot-descriptions.Rnw) 
Error in paste0("Example relative biomass index trends from trawl and longline surveys for ",  : 
  object 'surv_abbrev_text' not found
In addition: Warning messages:
1: Removed 2 rows containing missing values (geom_point). 
2: Removed 4 rows containing missing values (geom_rect). 
>
seananderson commented 6 years ago

Ah, that's a throwback to the old data format in one of the example plots that I haven't fixed. I'll fix it. Yes, sometimes errors can break the LaTeX and end up injecting things in weird places. The root of the problem is that there used to be a single file for each of the datatypes across species. Now there is a single file for each species with all the data types in that list and I haven't trashed all my files and tried compiling again.

seananderson commented 6 years ago

I think this commit (https://github.com/pbs-assess/gfsynopsis/commit/5f02ce11e2bc6fb73a3fb4cef4f3bc0a399ed911) should have fixed the issues with the document compiling.

andrew-edwards commented 6 years ago

Document builds! Just a few minor errors that show up:

A. page 2:

#> Error: ’hbll_n_grid’ is not an exported object from ’namespace:gfplot’
#> Error: ’hbll_s_grid’ is not an exported object from ’namespace:gfplot’
#> Error in data.frame(hbll_n, survey = "Outside Hard Bottom Long Line (N)", :
object ’hbll_n’ not found
#> Error in fortify(data): object ’hbll’ not found

B. 4. Ageing precision section just has:

#> Error in gzfile(file, "rb"): cannot open the connection
#> Error in grouped_df_impl(data, unname(vars), drop): Column
‘species_common_name‘ is unknown

C. 6.3 CPUE index standardization section plots the figure but also says:

#> Warning in gzfile(file, "rb"): cannot open compressed file
’../../cpue-cache/petrale-sole-3[CD]+-model.rds’, probable reason ’No such file or
directory’
#> Error in gzfile(file, "rb"): cannot open the connection
#> Error in cbind(object$par.fixed, sqrt(diag(object$cov.fixed))): object
’cpue_model’ not found

Oh, I haven't got the cpue-cache repo yet, so that seems to make sense(?).

Awesome. I can visually compare the .pdf files (built on your machine and mine) at some point.

seananderson commented 6 years ago

I fixed the aging precision error — it was just an old path.

To fix the CPUE issue after pulling the latest commits, trash one of the Petrale Sole pages in report/figure-pages, trash the relevant cpue .rds file in cpue-cache, and run make.R.

The

#> Error: ’hbll_n_grid’ is not an exported object from ’namespace:gfplot’
#> Error: ’hbll_s_grid’ is not an exported object from ’namespace:gfplot’

error seems strange because those should be exported from the latest gfplot.

Try this:

gfplot::hbll_n_grid

It's possible you have an older version that didn't have the North and South grids separated.