hmsc-r / HMSC

GNU General Public License v3.0
102 stars 37 forks source link

Error while converting Hmsc model object to JSON: `Error in rcpp_to_json(x, unbox, digits, numeric_dates, factors_as_string, : negative length vectors are not allowed` #190

Open elgabbas opened 4 months ago

elgabbas commented 4 months ago

I am preparing data for HMSC-HPC. The model implements GPP at the European scale (52K sampling units) for 142 species and 9 covariates and

M_Init
# Hmsc object with 52729 sampling units, 142 species, 9 covariates, 1 traits and 1 random levels
# No posterior samples

I can start sampling with no problem

Model <- Hmsc::sampleMcmc(hM = M_Init, samples = M_samples, thin = M_thin, 
transient = M_transient, nChains = nChains,  verbose = verbose, engine = "HPC")

However, I receive the following error when I convert the model object into JSON format.

Model <- jsonify::to_json(Model)
# Error in rcpp_to_json(x, unbox, digits, numeric_dates, factors_as_string,  : 
#  negative length vectors are not allowed

I have used the same approach for a subset of the data (smaller study area and less number of species) without a problem. This error could be due to the large object I have [but please see my next comment below].

pryr::object_size(Model)
# 1.06 GB

Model %>% sapply(pryr::object_size) %>% divide_by(1024*1024) %>% sort() %>% round(2)
#    initPar     samples   transient        thin     nChains 
#      0.00        0.00        0.00        0.00        0.00 
#   verbose   nParallel   useSocket     adaptNf   alignPost 
#       0.00        0.00        0.00        0.00        0.00 
#   Rupdater initParList          X1          hM dataParList 
#       0.00        3.29        6.84      130.59      879.97 

pryr::object_size(Model$dataParList[[1]][[1]]$distMat12)
# 887.53 MB

Is there a solution for this? Would the Hmsc-HPC work if I try another function to convert the model object to JSON other than the jsonify function?

Thanks

elgabbas commented 4 months ago

Update:

I have multiple model variants for the same locations and species. The conversion to JSON worked for some of them while some others failed. The difference between these model variants is the Knots used (location and distances between them), #samples/thin/transient values.

I think there should be no problem with file size or #samples/thin/transient combinations. I can export similar models employed knot distances of 20 and 40 km, but distances of 30, 50, and 60 km failed.

It is unclear why only the conversion failed for a particular GPP locations. Please note that I ensured that the locations of the GPP knots do not exactly overlap with the locations of sampling units by adding a small spatial noise (up to 100 m) if by chance any of the knots exactly overlap with the sampling units. See this issue.

I can share an example model object if this would help.

gtikhonov commented 4 months ago

Your hypothesis that the size of Hmsc model object being converted to json is the core source of problems seems to be the most plausible one. We have observed somewhat similar issues with overflowing json format ourselves. There is definitely no issue with #samples/thin/transient at the stage of R->HPC export, since these values have no effect on the exported object size. However, from your description of "knot distance" effect that you observed, I am not clear whether the jsonify conversion was less stable with smaller or with larger number of knots. Could you please report how many knots did you have in these variants that you've tried?

elgabbas commented 4 months ago

Thanks @gtikhonov for your reply,

Earlier, I tried the following distances. Distances of 20 and 40 km worked (15K and 4K knots), while distances 30, 50, and 60 km failed (7K, 2.8K, and 2K knots).

Distance # Knots WORKED?
20 km 15,046 WORKED
30 km 7,129 FAILED
40 km 4,290 WORKED
50 km 2,877 FAILED
60 km 2,096 FAILED

It seems this issue is not directly related to the number of knots used or object size. The model using 20 km knots is 8.82 GB and works while smaller models failed (30 km - 4.08 GB; 60 km - 1.58 GB).

I uploaded the unfitted models to this link.

load("Model_unfitted.RData")

nrow(Model_20$rL$sample$sKnot) # 15046
nrow(Model_30$rL$sample$sKnot) # 7129
nrow(Model_40$rL$sample$sKnot) # 4290
nrow(Model_50$rL$sample$sKnot) # 2877
nrow(Model_60$rL$sample$sKnot) # 2096

The following worked:

pryr::object_size(Model_20)    # 665.08 MB
Model_20 <- Hmsc::sampleMcmc(
  hM = Model_20, samples = 2000, thin = 5, transient = 1500, nChains = 4, verbose = 1000, engine = "HPC")
pryr::object_size(Model_20)    # 8.82 GB
Model_20_JSON <- jsonify::to_json(Model_20)
pryr::object_size(Model_20_JSON)

pryr::object_size(Model_40)    # 664.72 MB
Model_40 <- Hmsc::sampleMcmc(
  hM = Model_40, samples = 2000, thin = 5, transient = 1500, nChains = 4, verbose = 1000, engine = "HPC")
pryr::object_size(Model_40)    # 2.62 GB
Model_40_JSON <- jsonify::to_json(Model_40)

The following failed:

Error in rcpp_to_json(x, unbox, digits, numeric_dates, factors_as_string, : negative length vectors are not allowed

pryr::object_size(Model_30)    # 664.82 MB
Model_30 <- Hmsc::sampleMcmc(
  hM = Model_30, samples = 2000, thin = 5, transient = 1500, nChains = 4, verbose = 1000, engine = "HPC")
pryr::object_size(Model_30)    # 4.08 GB
Model_30_JSON <- jsonify::to_json(Model_30)

pryr::object_size(Model_50)    # 664.67 MB
Model_50 <- Hmsc::sampleMcmc(
  hM = Model_50, samples = 2000, thin = 5, transient = 1500, nChains = 4, verbose = 1000, engine = "HPC")
pryr::object_size(Model_50)    # 1.95 GB
Model_50_JSON <- jsonify::to_json(Model_50)

pryr::object_size(Model_60)    # 664.64 MB
Model_60 <- Hmsc::sampleMcmc(
  hM = Model_60, samples = 2000, thin = 5, transient = 1500, nChains = 4, verbose = 1000, engine = "HPC")
pryr::object_size(Model_60)    # 1.58 GB
Model_60_JSON <- jsonify::to_json(Model_60)