pbs-assess / sdmTMB

:earth_americas: An R package for spatial and spatiotemporal GLMMs with TMB
https://pbs-assess.github.io/sdmTMB/
183 stars 26 forks source link

Approximate memory usage? #253

Closed ColCarroll closed 1 year ago

ColCarroll commented 1 year ago

Hi! Thanks for the nice library!

I'm attempting to run train and predict on the wind dataset, but running into memory problems. I am wondering if you can provide any intuition (or numbers!) on the memory usage, or tips for being able to successfully run this.

More specifically,

The code I am running looks like

train <- read.csv('data/wind.train.csv')
train <- na.omit(train)
train$wind <- pmax(train$wind, 1e-4)
train$datetime <- as.POSIXct(train$datetime)
train <- add_utm_columns(train, utm_names=c("x", "y"))
train$location <- as.factor(train$location)
train$days <- as.integer(as.numeric(train$datetime - min(train$datetime), units='days'))

mesh <- make_mesh(train, c("x", "y"), cutoff=1)

# This fails with OOM
m <- sdmTMB(
    data = train,
    formula = wind ~ location + 1,
    mesh=mesh,
    family = lognormal(),
    spatial = "off",
    time = "days",
    spatiotemporal = "RW",
)
ColCarroll commented 1 year ago

sorry -- converted to discussion

Lewis-Barnett-NOAA commented 1 year ago

64GB of RAM is usually enough to fit any of these models....likely something up with the data or model structure.

On Thu, Aug 31, 2023 at 9:09 AM Colin Carroll @.***> wrote:

Hi! Thanks for the nice library!

I'm attempting to run train and predict on the wind http://r-spatial.github.io/gstat/reference/wind.html dataset, but running into memory problems. I am wondering if you can provide any intuition (or numbers!) on the memory usage, or tips for being able to successfully run this.

More specifically,

  • I'm on a machine with ~64GB of RAM,
  • the training dataset has 769,179 rows
  • There are 12 locations with readings

The code I am running looks like

train <- read.csv('data/wind.train.csv') train <- na.omit(train) train$wind <- pmax(train$wind, 1e-4) train$datetime <- as.POSIXct(train$datetime) train <- add_utm_columns(train, utm_names=c("x", "y")) train$location <- as.factor(train$location) train$days <- as.integer(as.numeric(train$datetime - min(train$datetime), units='days'))

mesh <- make_mesh(train, c("x", "y"), cutoff=1)

This fails with OOM

m <- sdmTMB( data = train, formula = wind ~ location + 1, mesh=mesh, family = lognormal(), spatial = "off", time = "days", spatiotemporal = "RW", )

— Reply to this email directly, view it on GitHub https://github.com/pbs-assess/sdmTMB/issues/253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKMJP2JSRX3FF7YXVMUORTXYCZMBANCNFSM6AAAAAA4GJ7JSY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Lewis Barnett, PhD (he/him/his) Research Fish Biologist

NOAA Fisheries, Alaska Fisheries Science Center 7600 Sand Point Way NE, Bldg 4 Seattle, Washington 98115 Google Voice: (206) 526-4111

ericward-noaa commented 1 year ago

2 thoughts about the dimensionality of this:

  1. How large is the mesh? If you type mesh$mesh$n, what is n?

  2. How many days do you have? This'll be the number of unique spatiotemporal fields being estimated, and could be blowing things up

ColCarroll commented 1 year ago

Ooh, thank you! The mesh is pretty reasonably sized -- mesh$mesh$n is 33, but there are 6574 days.