Closed dschwilk closed 7 years ago
If we subsample the landscape to 1/100 the original resolution. This can run on my machine (and in seconds). But that is reducing the data to 1 percent of the original. Other ideas?
ooh. I bet I can do it in chunks. maybe decadal chunks on the temporal side, and tenths of the orinigal landscape on the other.
Ok, so if I load bigmemory (for as.big.matrix) AND the bigalgebra package, I can get this to at least try to run but still it crashes. Perhaps this is a solution however if I move calculations to the linux cluster at TTU ?
OK, I have access to a high memory queue and will try to get this working. Info on job submissions below:
I have given you access to the ivy-highmem queue. Each node has around 256GB of
memory and 20 cores. You will need to update your job submission script to
change the queue name and the requested number of cores. Here is an example
submission script:
#!/bin/sh
#$ -V
#$ -N Ivy-highmem-job
#$ -o $JOB_NAME.o$JOB_ID
#$ -e $JOB_NAME.e$JOB_ID
#$ -cwd
#$ -S /bin/bash
#$ -P hrothgar
#$ -pe fill 40
#$ -q ivy-highmem
Running (in chunks) as of ce8d45b. But storage needs are enormous. Currently running on a hrothgar high memory node and about 1/4 of the way through CM tmin after three hours. So total run time is only several days, that is not terrible. But I am chunking the data into 2500 location chunks (eg portions of the landscape at a time. each 2500 xy chunk is about 600MB for the historical tmin predictions.
For now, I am simply splitting into landscape chunks. This works. I have successfully run tmin for the CM. This results in 765 RDS files (each a data frame with time series for 1500 landscape points). Total size of these RDS files is 171 GB. But DM and GM will be larger. @hpoulos : can we clip these landscapes first? Help with that? This would happen in predict-spatial.R
.
See also #37 and #36 solving these will allow more rapid computation.
saving only annual summaries essentially solves this (see #36).
Matrix multiplication for converting predicted PCA loadings and predicted PCA scores back to tmins/tmax across landscape and time is memory intensive. Fort example,
the below code illustrates the problem. ts is the full predicted historical PCA scores (14360 dates) and tl is the full landscape tmin loadings for the DM (1290564 locations). Sot he resulting matrix should be 1290564 rows and 14360 columns and that is more cells than can be represented by a 64 bit integer.
I'm ashamed to say I did not foresee this problem.