Inconsistencies when fitting ACE and SNAP on same data/input

megmcca commented 12 months ago

LOCAL SOLVER FOR RIDGE: Getting very bad errors on local solver ACE compared to scikit learn for below setup: in transpose_trick_new_test_err.py

Using scikit learn RIDGE solver
Time to perform fit: 0.07 s
group, row_type, nconfigs, ncount, mae, rmse
ALL E 30 2160 [0.041707, 0.055821, 0.0]
ALL F 30 6480 [0.142111, 0.199452, 0.0]
dft_loT_Mo E 15 1080 [0.043204, 0.054093, 0.0]
dft_loT_Mo F 15 3240 [0.146462, 0.206264, 0.0]
dft_loT_Nb E 15 1080 [0.04021, 0.057507, 0.0]
dft_loT_Nb F 15 3240 [0.13776, 0.192398, 0.0]
... change local solver, run again...

Using local RIDGE solver
Time to perform fit: 2.75 s
group, row_type, nconfigs, ncount, mae, rmse
ALL E 30 2160 [10705945.644087, 14919544.374425, 0.0]
ALL F 30 6480 [3.58186, 5.022889, 0.0]
dft_loT_Mo E 15 1080 [13708253.899167, 18631936.52951, 0.0]
dft_loT_Mo F 15 3240 [3.912373, 5.472976, 0.0]
dft_loT_Nb E 15 1080 [7703637.389007, 9901340.810274, 0.0]
dft_loT_Nb F 15 3240 [3.251348, 4.528286, 0.0]

LOCAL SOLVER DEFAULT: if local_solver is commented out, it is automatically enabled. this is a problem due to the above problem... probably need to set it to 0 for now, or figure out why we have it toggled on.

ERROR CALCS: Copying traditional error calcs in the transpose_trick error stuff, SNAP is identical between standard fit and re-calculated fits, ACE is not.

Example (see input files below): SNAP, MAE all unweighted training E: trad 7.07261, new 7.07261 SNAP, MAE all unweighted training F: trad 0.161032, new 0.161032 SNAP, RMSE all unweighted training E: trad 7.07593, new 7.07593 SNAP, RMSE all unweighted training F: trad 0.232894, new 0.232895 (same pattern with bzeroflag = 0)

ACE, MAE all unweighted training E: trad 0.108551, new 0.041707 ACE, MAE all unweighted training F: trad 0.138716, new 0.055821 ACE, RMSE all unweighted training E: trad 0.134556, new 0.142111 ACE, RMSE all unweighted training F: trad 0.195827, new 0.199452

megmcca commented 12 months ago

ace_test.in

[ACE]
numTypes = 4
rcutfac = 6.026  6.026  6.026  5.922  6.026  6.026  6.026  5.922  6.026  6.026  6.026  5.922  5.922  5.922  5.922  5.818
lambda = 1.808  1.808  1.808  1.776  1.808  1.808  1.808  1.776  1.808  1.808  1.808  1.776  1.776  1.776  1.776  1.745
rcinner = 0.580  0.580  0.580  0.570  0.580  0.580  0.580  0.570  0.580  0.580  0.580  0.570  0.570  0.570  0.570  0.560
drcinner = 0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01  0.01
ranks = 1 2
lmax =  0 1 
nmax =  6 2 
mumax = 4
lmin = 0 1
nmaxbase = 6
type = Mo Nb Ta Ti
bzeroflag = 0

[CALCULATOR]
calculator = LAMMPSPACE
energy = 1
force = 1
stress = 0

[ESHIFT]
Mo = 4.0984
Nb = 2.521626
Ta = 3.762075
Ti = 2.912304

[SOLVER]
solver = RIDGE
#compute_testerrs = 1
#detailed_errors = 1

[RIDGE]
local_solver = 0
alpha = 1.E-4

[SCRAPER]
scraper = JSON

[MEMORY]
override = 1

[PATH]
#dataPath = /ascldap/users/megmcca/training_data/JSON_MoNbTaTi_fs3lib_csvs/sym_MoNbTaTi_bcc-bundled_ngroups6_nconfigs15489_nevery20
# FOR TESTING
dataPath = /ascldap/users/megmcca/training_data/JSON_MoNbTaTi_fs3lib_csvs/sym_MoNbTaTi_ngroups10_nconfigs1395_nevery1000

[OUTFILE]
output_style = PACE
metrics = metrics.md
potential = pot_HEA

[REFERENCE]
units = metal
# with no ZBL
#atom_style = atomic
#pair_style = zero 6.5
#pair_coeff = * *
pair_style = hybrid/overlay zero 6.5 zbl 0.580000 2.320000 zbl 0.580000 2.320000 zbl 0.580000 2.320000 zbl 0.570000 2.280000 zbl 0.580000 2.320000 zbl 0.580000 2.320000 zbl 0.570000 2.280000 zbl 0.580000 2.320000 zbl 0.570000 2.280000 zbl 0.560000 2.240000
pair_coeff1 = * * zero
pair_coeff2 = 1 1 zbl 1    42 42
pair_coeff3 = 1 2 zbl 2    42 41
pair_coeff4 = 1 3 zbl 3    42 73
pair_coeff5 = 1 4 zbl 4    42 22
pair_coeff6 = 2 2 zbl 5    41 41
pair_coeff7 = 2 3 zbl 6    41 73
pair_coeff8 = 2 4 zbl 7    41 22
pair_coeff9 = 3 3 zbl 8    73 73
pair_coeff10 = 3 4 zbl 9    73 22
pair_coeff11 = 4 4 zbl 10    22 22

[GROUPS]
group_sections = name training_size testing_size eweight fweight
group_types = str float float float float
smartweights = 0
random_sampling = 0

# Pures and concentrated elements extracted 
#dft_loT_eq_ternary      = 1.0   1   10
dft_loT_Nb              = 0.13 0.0   1   10
dft_loT_Mo              = 0.12 0.0 1   10
#dft_loT_Ta              = 1.0   1   10
#dft_loT_Ti              = 1.0   1   10
#aimd_eq_ternary         = 1.0   1   10
#aimd_Nb                 = 1.0   1   10
#aimd_Mo                 = 1.0   1   10
#aimd_Ta                 = 1.0   1   10
#aimd_Ti                 = 1.0   1   10

# BCC-bundled
#dft_loT_eq_ternary      = 1.0   1   10
#dft_loT_Ti_bcc_hcp      = 1.0   1   10
#dft_loT_MoNbTa_bcc      = 1.0   1   10
#aimd_eq_ternary         = 1.0   1   10
#aimd_Ti_bcc_hcp         = 1.0   1   10
#aimd_MoNbTa_bcc         = 1.0   1   10

[EXTRAS]
dump_descriptors = 0
dump_truth = 0
dump_weights = 0
dump_dataframe = 0
multinode_testing = 1

megmcca commented 12 months ago

snap_test.in

[BISPECTRUM]
numTypes = 4
twojmax = 4 4 4 4 
rcutfac = 6.0
rfac0 = 0.99363
rmin0 = 0.0
wj = 0.9879004646 0.7628703537 0.7141464964 0.6913809132 
radelem = 0.5769951767 0.6069745289 0.5308817313 0.4942388052
type = Mo Nb Ta Ti
wselfallflag = 0
chemflag = 0
bzeroflag = 0
quadraticflag = 0

[ESHIFT]
Mo = 4.0984
Nb = 2.521626
Ta = 3.762075
Ti = 2.912304

[SCRAPER]
scraper = JSON

[CALCULATOR]
calculator = LAMMPSSNAP
energy = 1
force = 1
stress = 0

[SOLVER]
solver = RIDGE
compute_testerrs = 0
detailed_errors = 0

[RIDGE]
# local_solver = 0
alpha = 1.E-4

[PATH]
#dataPath = /ascldap/users/megmcca/training_data/JSON_MoNbTaTi_fs3lib_csvs/sym_MoNbTaTi_bcc-bundled_ngroups6_nconfigs15489_nevery20
# FOR TESTING
dataPath = /ascldap/users/megmcca/training_data/JSON_MoNbTaTi_fs3lib_csvs/sym_MoNbTaTi_ngroups10_nconfigs1395_nevery1000

[OUTFILE]
metrics = metrics-snap_bzero1.md
potential = pot_HEA-snap_bzero1

[REFERENCE]
units = metal
atom_style = atomic

pair_style = hybrid/overlay zero 10.0 zbl 1.0 2.15
pair_coeff1 = * * zero
pair_coeff2 = 1 1 zbl 42 42
pair_coeff3 = 2 2 zbl 41 41
pair_coeff4 = 3 3 zbl 73 73
pair_coeff5 = 4 4 zbl 22 22
pair_coeff6 = 1 2 zbl 42 41
pair_coeff7 = 1 3 zbl 42 73
pair_coeff8 = 1 4 zbl 42 22
pair_coeff9 = 2 3 zbl 41 73
pair_coeff10 = 2 4 zbl 41 22
pair_coeff11 = 3 4 zbl 73 22

[EXTRAS]
dump_descriptors = 0
dump_truth = 0
dump_weights = 0
multinode_testing = 1

[MEMORY]
override = 0

[GROUPS]
group_sections = name training_size testing_size eweight fweight 
group_types = str float float float float 
smartweights = 0
random_sampling = 0

# Pures and concentrated elements extracted 
#dft_loT_eq_ternary      = 1.0   1   10
#dft_loT_Nb              = 1.0   1   10
dft_loT_Nb              = 0.13 0.0   1   10
dft_loT_Mo              = 0.12 0.0 1   10
#dft_loT_Ta              = 1.0   1   10
#dft_loT_Ti              = 1.0   1   10
#aimd_eq_ternary         = 1.0   1   10
#aimd_Nb                 = 1.0   1   10
#aimd_Mo                 = 1.0   1   10
#aimd_Ta                 = 1.0   1   10
#aimd_Ti                 = 1.0   1   10

# BCC-bundled
#dft_loT_eq_ternary      = 1.0   1   10
#dft_loT_Ti_bcc_hcp      = 1.0   1   10
#dft_loT_MoNbTa_bcc      = 1.0   1   10
#aimd_eq_ternary         = 1.0   1   10
#aimd_Ti_bcc_hcp         = 1.0   1   10
#aimd_MoNbTa_bcc         = 1.0   1   10

megmcca / FitSNAP

Inconsistencies when fitting ACE and SNAP on same data/input #15