rouyang2017 / SISSO

A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
Apache License 2.0
234 stars 77 forks source link

subs_sis >= 1000 --> segmentation fault #5

Closed vonruti closed 5 years ago

vonruti commented 5 years ago

I am facing an other problem. As soon as I set the variable subs_sis >= 1000 I get a segmentation fault:

Final sure independence screening ... Size of the SIS-selected subspace from phi02: 1000 FC done!

DI starts ... forrtl: severe (174): SIGSEGV, segmentation fault occurred

This hapens before the first iteration finishes. Do you have any Idea why I get this error?

rouyang2017 commented 5 years ago

Thank you for your question. I don't know what causes that as I never see that error before. Could you please show me your parameter setting in the SISSO.in file?

vonruti commented 5 years ago

Thank you for your response. Here is my parameter setting. For this particular setting the calculation starts to crash when I use subs_sis >= 645

task='FCDI'          !'FCDI': iterative FC and DI; 'FC': one-off FC; 'DI': one-off DI
restart=.false.      ! for 'FCDI' and when the file 'CONTINUE' is present
desc_dim=8           ! dimension of the descriptor (<=2 for 'quali')
ptype='quanti'       ! property type: 'quanti'(quantitative property),'quali'(qualitative property)
nprop=1              ! number of properties (multi-task learning if nprop >1)
nsample=202       ! number of samples for each property (seperate them by comma for nprop >1)
width=0.001          ! boundary tolerance (width) for classification (count in outside points very close to the domain)

!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
! parameters for feature construction
! implemented operators:(+)(-)(*)(/)(exp)(exp-)(^-1)(^2)(^3)(sqrt)(cbrt)(log)(|-|)(scd)(^6)(sin)(cos)
! scd: standard Cauchy distribution
!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
nsf= 19               ! number of scalar features (one number for each material)
rung=2           ! rung (0,1,2,3) of the feature space to be constructed
maxcomplexity=10     ! max feature complexity (number of operators in a feature)
opset='(+)(-)(*)(/)(exp)(log)(^-1)(^2)(^3)(sqrt)(cbrt)(|-|)'  ! ONE operator set for feature transformation
ndimtype=8           ! number of dimension type (or unit) of the primary features
dimclass=(1:9)(10:11)(12:12)(13:13)(14:14)(15:15)(16:18)(19:19)    ! Group features
maxfval_lb=1e-3      ! features having the max. abs. data value <maxfval_lb will not be selected
maxfval_ub=1e5       ! features having the max. abs. data value >maxfval_ub will not be selected
subs_sis=645         ! SIS-selected (single) subspace size

!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
! parameters for descriptor identification
!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
method='L0'          ! sparsification operator: 'L1L0' or 'L0'
fs_size_DI=          ! to be set by SISSO automatically if task='FCDI': feature-space size
fs_size_L0=          ! to be set by SISSO automatically if (task='FCDI' and method ='L0'): size for L0
metric='LS_RMSE'     ! for ptype='quanti'; metric for model selection: LS_RMSE,LS_MaxAE
nm_output=100        ! number of the best models to output
rouyang2017 commented 5 years ago

At which iteration (dimension) does that error appear ? Is it > 3 ? Your SISSO.in file is good (I tested that and SISSO run perfectly up to the 3rd iteration, and then I cancelled the job). However, your desc_dim=8 is of high dimension (to SISSO), and still you are using a large subs_sis=645. That means, at the 8th iteration, the L0 will evaluate the total number of candidate models of (645*8)^8/8!=10^25. This is too many (e.g. need millions of years to finish with the current HPC), and may cause insufficient-memory problem. Please check if that error happens at high iteration (e.g. > 3), and if so, either the subspace size "subs_sis" or the descriptor dimension "desc_dim" need to be reduced.

vonruti commented 5 years ago

It crashes already in iteration 1. When I reduce sub_sis it only a bit it does not crash until iteration 4. I can not say for dimensions higher than 4 as I am still testing stuff and did not run the program for that long.

rouyang2017 commented 5 years ago

OK, Would you mind sending me your train.dat file via email (ouyang@fhi-berlin.mpg.de) ? I can then answer your questions here after I figure out why.