Open liuyanguu opened 2 years ago
Thanks very much. I haven't worked with the India DHS very much, but I'm not too surprised it runs into a memory issue.
If you still have this open / are able to reproduce again, can you call traceback()
after the error to see exactly where it is occurring?
It might be failing in the calls to the survey
package. If so, you could try the jackknife standard error instead:
u5mr <- calc_nqx(dt1, strata = ~v022, varmethod = "jk1")
Thank you so much for your quick response! Here is what's returned from traceback()
Error: cannot allocate vector of size 14.5 Gb
> traceback()
4: double(osize)
3: pyears(formula, data, scale = scale, data.frame = TRUE, weights = weights)
2: demog_pyears(f, mf, period = period, agegr = agegr, tips = tips,
event = "(death)", tstart = "(dob)", tstop = "(tstop)", weights = "(weights)",
origin = origin, scale = scale)
1: calc_nqx(dt1, strata = ~v022, varmethod = "jk1")
Looks like just the first step in calc_nqx, which calls survival::pyears
?
Indeed the issue was raised by pyears
> pyears(formula, data, scale=scale, data.frame=TRUE, weights=weights)
Error in rowSums(is.na(unclass(x))) :
'Calloc' could not allocate memory (1274250 of 16 bytes)
> traceback()
11: rowSums(is.na(unclass(x)))
10: as.vector(rowSums(is.na(unclass(x))) > 0)
9: is.na.Surv(x)
8: is.na(x)
7: na.omit.data.frame(structure(list(`Surv(`(tstop)` - `(dob)`, `(death)`)` = structure(c(106,
135, 173, 39, 112, 175, 190, 6, 6, 0.690000000000055, 0.650000000000091,
0.619999999999891, 53, 15, 111, 138, 164, 18, 77, 101, 127, 78,
98, 218, 279, 306, 325, 358, 27, 76, 0.5, 28, 40, 68, 90, 65,
80, 102, 79, 85, 124, 147, 273, 2, 139, 175, 208, 238, 194, 226,
237, 271, 130, 164, 209, 237, 47, 63, 6, 43, 43, 118, 149, 210,
234, 269, 309, 8, 76, 105, 130, 23, 45, 111, 160, 187, 159, 199,
220, 0.539999999999964, 17, 43, 75, 126, 2.5, 49, 89, 23, 63,
72.5, 193, 222, 122, 168, 210, 245, 101, 146, 174, 216.5, 91,
113, 174, 236, 269, 279, 307, 124, 242, 269, 319, 78, 103, 125,
52, 92, 139, 79, 98, 125, 21, 43, 77, 103, 124, 150, 140, 172,
199, 221, 231, 259, 272, 100, 117, 135, 163, 192, 157, 208, 249,
84, 141, 186, 226, 249, 29, 56, 172, 206, 250, 270, 7, 77, 106,
150, 21, 43, 163, 214, 223, 250, 18, 256, 293, 115, 161, 262,
262, 283, 308, 45, 117, 1, 152, 38, 66, 1, 133, 176, 209, 68,
100, 137, 112, 138, 157, 188, 29, 87, 114, 69, 116, 106, 133,
187, 44, 66, 53, 93, 32, 233, 0.740000000000009, 298, 319, 245,
266, 296, 320, 141, 169, 141, 162, 211, 218, 28, 230, 256, 293,
332, 32, 62, 102, 136, 52, 85, 110, 204, 171, 212, 75, 101, 41,
81, 103, 34, 65, 92, 143, 214, 57, 90, 283, 344, 134, 34, 80,
...
6: na.omit(structure(list(`Surv(`(tstop)` - `(dob)`, `(death)`)` = structure(c(106,
135, 173, 39, 112, 175, 190, 6, 6, 0.690000000000055, 0.650000000000091,
0.619999999999891, 53, 15, 111, 138, 164, 18, 77, 101, 127, 78,
98, 218, 279, 306, 325, 358, 27, 76, 0.5, 28, 40, 68, 90, 65,
80, 102, 79, 85, 124, 147, 273, 2, 139, 175, 208, 238, 194, 226,
237, 271, 130, 164, 209, 237, 47, 63, 6, 43, 43, 118, 149, 210,
234, 269, 309, 8, 76, 105, 130, 23, 45, 111, 160, 187, 159, 199,
220, 0.539999999999964, 17, 43, 75, 126, 2.5, 49, 89, 23, 63,
72.5, 193, 222, 122, 168, 210, 245, 101, 146, 174, 216.5, 91,
113, 174, 236, 269, 279, 307, 124, 242, 269, 319, 78, 103, 125,
52, 92, 139, 79, 98, 125, 21, 43, 77, 103, 124, 150, 140, 172,
199, 221, 231, 259, 272, 100, 117, 135, 163, 192, 157, 208, 249,
84, 141, 186, 226, 249, 29, 56, 172, 206, 250, 270, 7, 77, 106,
150, 21, 43, 163, 214, 223, 250, 18, 256, 293, 115, 161, 262,
262, 283, 308, 45, 117, 1, 152, 38, 66, 1, 133, 176, 209, 68,
100, 137, 112, 138, 157, 188, 29, 87, 114, 69, 116, 106, 133,
187, 44, 66, 53, 93, 32, 233, 0.740000000000009, 298, 319, 245,
266, 296, 320, 141, 169, 141, 162, 211, 218, 28, 230, 256, 293,
332, 32, 62, 102, 136, 52, 85, 110, 204, 171, 212, 75, 101, 41,
81, 103, 34, 65, 92, 143, 214, 57, 90, 283, 344, 134, 34, 80,
...
5: model.frame.default(formula = formula, data = data, weights = weights)
4: stats::model.frame(formula = formula, data = data, weights = weights)
3: eval(tform, parent.frame())
2: eval(tform, parent.frame())
1: pyears(formula, data, scale = scale, data.frame = TRUE, weights = weights)
Hi @liuyanguu,
Thanks for this—very helpful. In the branch issue-15 I have changed calc_nqx()
to process the data through demog_pyears()
in batches to avoid memory allocation error (default is set to batch_size = 100000
).
Could you try installing that branch and testing again with your India example?
devtools::install_github("mrc-ide/demogsurv@issue-15)
Do you need to do any of the other calculations (e.g. fertility) on the India data set? It might be an issue for those as well.
Thanks, Jeff
Wonderful! Thank you so much for such a prompt reply! It works. I have an extra question, I see the latest available period is 2021, what reference period does it refer to? Is it for all the deaths that happened in the calendar year 2021 (Jan.-Dec.)?
Yes that refers to calendar year 2021. You can adjust the time splits using the period
argument.
Thanks, Jeff
Message ID: @.***>
Many thanks for the great package, it runs very fast! I am working on calculating U5MR from birth history for the latest India DHS 2019. As we know the India DHS datasets are much larger than any other DHS. Just the input file selecting only the columns we need is over 100MB.
Running
calc_nqx
easily triggers the error like "cannot allocate vector of size 14.5 Gb", any experience on that would be much appreciated.This is not a reproducible code, but you get the idea
The input data I used can be downloaded from Dropbox (100 MB): Dropbox link to the file