Moderate dimensionality (more than 128) leads to fatal error for nn_linear and nnf_linear

dkibalnikov commented 9 months ago

Moderate dimensionality (more than 128) leads to fatal error (breaks R session) for linear function and linear module. Examples: nn_linear(20, 128)(torch_randn(128, 20)) nnf_linear(torch_rand(128, 128), torch_rand(128, 128))

R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Ventura 13.5.2

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] torch_0.11.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.11 lattice_0.21-8 ps_1.7.5 zoo_1.8-12 digest_0.6.31
[6] utf8_1.2.3 mime_0.12 R6_2.5.1 quanteda_3.3.1 httr_1.4.6
[11] ggplot2_3.4.2 pillar_1.9.0 rlang_1.1.1 curl_5.0.1 rstudioapi_0.14
[16] data.table_1.14.9 miniUI_0.1.1.1 callr_3.7.3 TTR_0.24.3 Matrix_1.5-4.1
[21] sentopics_0.7.2 stringr_1.5.0 htmlwidgets_1.6.2 bit_4.0.5 munsell_0.5.0
[26] shiny_1.7.4 compiler_4.2.2 httpuv_1.6.11 pkgconfig_2.0.3 htmltools_0.5.5
[31] tidyselect_1.2.0 tibble_3.2.1 codetools_0.2-19 fansi_1.0.4 dplyr_1.1.2
[36] later_1.3.1 grid_4.2.2 xtable_1.8-4 gtable_0.3.3 lifecycle_1.0.3
[41] magrittr_2.0.3 coro_1.0.3 scales_1.2.1 RcppParallel_5.1.7 writexl_1.4.2
[46] bench_1.1.3 viewxl_0.1.4 quantmod_0.4.23 cli_3.6.1 stringi_1.7.12
[51] promises_1.2.0.1 xml2_1.3.4 ellipsis_0.3.2 stopwords_2.3 xts_0.13.1
[56] generics_0.1.3 vctrs_0.6.3 fastmatch_1.1-3 ompr.roi_1.0.1 tools_4.2.2
[61] bit64_4.0.5 glue_1.6.2 purrr_1.0.1 processx_3.8.2 fastmap_1.1.1
[66] colorspace_2.1-0 rvest_1.0.3 profvis_0.3.8 emphatic_0.1.4

cregouby commented 9 months ago

Hello @dkibalnikov,

I cannot reproduce it even with 10x size :

library(torch) 
nn_linear(20, 1280)(torch_randn(1280, 20))
#> torch_tensor
#> Columns 1 to 6-7.0898e-01  4.1157e-01 -1.1364e+00  8.4133e-01  1.0676e-01  9.1039e-02
#>  1.3702e+00 -3.6942e-01 -1.1948e-01  8.0703e-02  1.0708e+00  9.5189e-01
#>  2.4707e-01 -3.9200e-01 -1.8124e-01  1.9765e-02  6.1204e-01 -5.2249e-02
#> -1.7791e-01  2.2077e-01 -7.7681e-01  7.3838e-01  4.8283e-01  1.3482e+00
#>  4.1828e-01  2.1301e-01 -8.6139e-02  4.5348e-01  6.6507e-01  1.8943e-02
#> -7.4575e-01  1.6341e-01 -1.2503e+00  9.8950e-01  4.6170e-01  3.0663e-02
#> -2.4904e-01  7.6111e-02 -5.9435e-01  4.4329e-01 -2.3553e-01 -1.0150e+00
#>  6.7522e-01  1.2541e-01  7.9450e-02 -2.0349e-01  9.2089e-01 -2.4564e-01
#>  4.5052e-01  1.5246e+00  5.8488e-01 -1.3509e-01  2.0966e-01 -1.1299e+00
#>  8.6869e-01  4.6408e-01 -1.9164e-02  2.1840e-01  1.0361e-01 -4.5342e-01
#>  6.1484e-01  2.3663e-01  1.0167e-01  2.1550e-01  4.0945e-01 -1.0551e-01
#> -5.5385e-01  8.4911e-01 -1.6550e-01 -6.4211e-02 -3.0238e-01  5.2292e-02
#> -3.6670e-01 -8.0450e-01 -2.2089e-02  3.4483e-02 -5.6338e-01  8.6086e-01
#> -3.8352e-01 -4.6327e-01 -7.5900e-02  8.0230e-01 -9.5715e-02 -1.3269e+00
#> -3.1438e-01  2.6760e-01  8.9636e-02 -1.0697e+00 -1.6974e+00 -3.5554e-01
#>  7.0102e-02  3.9760e-01  8.5948e-01 -5.1368e-01 -1.9069e-01 -2.0346e-02
#>  6.3078e-01  1.5253e-01  1.4166e-01  1.6667e-01  4.0800e-01  8.3231e-01
#> -8.7949e-02 -1.9206e-01 -7.5689e-01  1.1478e+00  2.1870e-01  4.4476e-01
#>  1.6508e-01  2.1932e-01 -2.2830e-01  2.5127e-01  4.4608e-01  8.3165e-01
#> -8.1501e-02  7.9883e-01 -2.3249e-01  1.6306e-01  1.6357e-01  1.2276e-01
#> -1.4001e-01  5.3287e-01 -7.0814e-01  8.9028e-01  8.2789e-01  1.1098e-01
#>  8.9009e-01 -4.1135e-01  4.8019e-01  6.0969e-01  9.0903e-02 -4.5155e-02
#>  1.2557e+00 -4.8705e-01  4.8003e-01  5.7697e-01  7.4863e-01  6.3192e-01
#> -7.0335e-01  5.2271e-01 -9.1448e-01 -3.5012e-02  3.9500e-01  4.8847e-01
#>  1.2214e+00  9.8051e-02  7.6372e-01 -4.7775e-01  7.2770e-01  1.0678e+00
#>  6.0582e-01  4.4896e-01  7.5375e-01  4.2252e-01  4.1088e-02 -7.4432e-01
#> -6.1031e-02  2.1138e-02 -2.7871e-01  7.9596e-01  6.1431e-01  7.7275e-01
#>  8.0580e-02  1.9898e-01  1.9314e-01  1.2023e+00  2.5634e-01 -5.1424e-01
#> -8.4975e-01 -6.4094e-01 -1.3000e+00  6.7161e-01 -7.0074e-01  2.3980e-01
#>  5.5043e-02 -6.0623e-01 -1.9979e-01  1.1136e+00  5.5584e-01  8.8245e-01
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{1280,1280} ][ grad_fn = <AddmmBackward0> ]
nnf_linear(torch_rand(128, 1280), torch_rand(128, 1280))
#> torch_tensor
#> Columns 1 to 8 314.5612  334.5342  324.1125  326.3635  323.0993  312.6442  323.0829  320.6299
#>  327.3670  338.8271  332.6741  328.6799  331.7452  321.7097  319.3501  329.3346
#>  305.4786  321.5266  312.7684  317.3124  317.7830  304.5445  310.2312  309.1551
#>  311.2249  329.8019  316.9099  318.7080  323.9138  319.5287  315.1199  316.6309
#>  314.1612  335.6437  328.4605  332.8960  330.3456  310.6036  320.1300  321.2331
#>  311.8161  330.1847  312.6392  315.1529  323.1219  307.4629  317.7228  315.0923
#>  318.2993  333.3435  322.6922  321.0600  325.2026  315.3970  318.6345  323.4004
#>  307.8209  320.0415  314.7405  314.2267  319.9475  308.1620  307.2529  311.5896
#>  307.4026  318.3667  306.9087  305.1812  324.1178  309.9085  304.8435  307.5104
#>  310.8539  323.4612  310.7436  313.0002  319.1089  302.0583  310.4747  311.0898
#>  319.2216  327.7248  323.8221  321.8202  326.8770  315.4436  320.6340  327.8873
#>  317.3966  332.0334  327.7425  324.1236  328.6198  313.6400  320.9924  320.2305
#>  318.8514  331.8930  319.2690  326.7739  336.6279  313.5105  323.1542  318.9379
#>  322.3104  332.9274  318.9480  324.0622  330.4696  315.2674  319.5331  314.2921
#>  318.8631  335.2570  325.9142  322.9870  327.8271  313.1253  317.1335  318.6035
#>  319.4509  336.7789  327.8356  328.9484  329.4269  318.0941  319.2609  322.1582
#>  311.4863  327.9283  321.1350  316.9962  324.0696  315.0074  311.9630  313.4155
#>  314.7720  336.1359  329.2325  330.1570  331.6602  314.9024  318.6237  322.9298
#>  310.7311  320.6415  317.7906  313.3005  318.3006  305.2761  308.1657  316.2002
#>  320.1846  332.4483  326.2688  329.7654  328.1264  312.9500  320.8858  320.7798
#>  319.3460  328.9277  324.0204  320.0679  325.3782  310.1682  320.8093  325.2496
#>  318.9281  327.8293  321.0062  322.4516  326.2178  308.5112  316.5901  318.3667
#>  305.0467  318.0155  314.9675  312.6392  314.1836  308.3572  307.9530  313.1931
#>  319.5083  337.1085  327.9420  332.6487  335.2065  320.2056  324.3575  325.1026
#>  313.2246  328.4168  319.8751  316.7077  322.1813  313.8553  314.3896  319.2457
#>  317.0706  340.8828  328.8917  331.1060  335.5088  317.4178  317.8134  325.7239
#>  315.8531  323.3284  323.6129  331.0298  325.7739  308.7908  322.6524  317.4842
#>  321.7771  334.7800  332.5665  320.5312  333.2360  319.7244  321.7923  320.1363
#>  313.0137  324.0544  315.8846  315.8236  325.8558  306.1601  309.1360  309.1877
#>  313.7323  327.9250  316.6913  313.6987  322.5557  309.9584  315.3717  305.8556
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{128,128} ]

^{Created on 2023-09-26 with reprex v2.0.2}

any clue of what could be specific on your setup ? (lack of RAM maybe ?)

dkibalnikov commented 9 months ago

Hi @cregouby

Thank you for response. I did recheck and now my samples work fine. The issue is likely related to system environment. Only one thing I have changed since opening issue. That is updating Command Line Tools.

Command Line Tools for Xcode: Version: 15,0 Source: Apple Install Date: 24.09.2023, 17:48

I hope such finding will also help somebody.

cregouby commented 9 months ago

You're welcome, @dkibalnikov Please close the issue if it is fine for you.

mlverse / torch

Moderate dimensionality (more than 128) leads to fatal error for nn_linear and nnf_linear #1105