ttadano / alamode

Ab initio simulator for thermal transport and lattice anharmonicity
http://sourceforge.net/projects/alamode
MIT License
132 stars 53 forks source link

'std::bad_alloc' Error #50

Closed abhirup86 closed 1 year ago

abhirup86 commented 2 years ago

Hi, I am trying to train the cubic scaling after performing VASP single point calculations for the 8485 structures predicted by my "cubic.pattern_ANHARM3" file. However, I am getting the following error-

OPTIMIZATION

LMODEL = least-squares

Training data file (DFSET) : DFSET_cubic

NSTART = 1; NEND = 8484 8484 entries will be used for training.

Total Number of Parameters : 52693 Total Number of Free Parameters : 44192

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted

I tried allocating higher memory (>120 GB), but, I am still stuck with this error. Can anyone suggest how to solve this issue? I didn't have this issue while reproducing the Si example.

Thanks, Abhirup

ttadano commented 2 years ago

This occurs likely because of the large RAM requirement. Please see #47.

abhirup86 commented 2 years ago

I tried SPARSE = 1, and also tried to run in bigger memory nodes (512 GB) it still gives me the same error. It seems there is more than the sensing matrix allocated in the program. Any suggestion on how to solve this?

ttadano commented 2 years ago

I have not encountered this issue before. Perhaps, the error occurs only when the input array length is very large. Could you provide more detailed information including the input files for ALM? They are necessary for identifying the error location. (If you feel reluctant to upload the files, please send them directly to me via email.)

abhirup86 commented 2 years ago

Thanks for the email. I just sent you an email to your 'gmail' from your website. In case you don't see my mail please check the spam.

Thank you so much once again.

Abhirup


Abhirup Patra Research Scientist II Delaware Energy Institute University of Delaware, Newark, DE


From: Terumasa TADANO @.> Sent: Wednesday, November 24, 2021 7:55 PM To: ttadano/alamode @.> Cc: Abhirup Patra @.>; Author @.> Subject: Re: [ttadano/alamode] 'std::bad_alloc' Error (Issue #50)

I have not encountered this issue before. Perhaps, the error occurs only when the input array length is very large. Could you provide more detailed information including the input files for ALM? They are necessary for identifying the error location. (If you feel reluctant to upload the files, please send them directly to me via email.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ttadano/alamode/issues/50#issuecomment-978656953, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACU6X3BNE2OKI6EIJB4ETH3UNWCO7ANCNFSM5IRR7LWQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ttadano commented 2 years ago

Thank you for the input files.

I've found that you use SPARSE = 1 combined with LMODEL = enet, but SPARSE = 1 is effective only for the ordinary least-squares (LMODEL = ols). Please set like

 &optimize
     LMODEL = ols
     SPARSE = 1
     ...

While the calculation of 3rd-order force constants has not finished, the bad_alloc error did not appear (so far).

abhirup86 commented 2 years ago

Thanks. I will set LMODEL = ols and give it a try in a bigger memory node.


Abhirup Patra Research Scientist II Delaware Energy Institute University of Delaware, Newark, DE


From: Terumasa TADANO @.> Sent: Thursday, November 25, 2021 6:22 AM To: ttadano/alamode @.> Cc: Abhirup Patra @.>; Author @.> Subject: Re: [ttadano/alamode] 'std::bad_alloc' Error (Issue #50)

Thank you for the input files.

I've found that you use SPARSE = 1 combined with LMODEL = enet, but SPARSE = 1 is effective only for the ordinary least-squares (LMODEL = ols). Please set like

&optimize LMODEL = ols SPARSE = 1 ...

While the calculation of 3rd-order force constants has not finished, the bad_alloc error did not appear (so far).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ttadano/alamode/issues/50#issuecomment-979111913, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACU6X3DSUVACLD23UFB2THDUNYL55ANCNFSM5IRR7LWQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

abhirup86 commented 2 years ago

Hi, I tried with SPARSE=1 and LMODEL=ols, this time, I did not get the error message but it's been more than 24 hrs I don't see any progress in the calculation.

ttadano commented 2 years ago

The calculation using the same inputs finished in ~12 hours using a Xeon node as shown below:

OPTIMIZATION
 ============

  LMODEL = least-squares

  Training data file (DFSET) : DFSET_cubic

  NSTART = 1; NEND = 8484
  8484 entries will be used for training.

  Total Number of Parameters : 56005
  Total Number of Free Parameters : 44192

  Now, start fitting ...
  Solve least-squares problem by Eigen SimplicialLDLT.
  Residual sum of squares for the solution: 0.534217
  Fitting error (%) : 11.5154

  Time Elapsed: 41419.6 sec.

 -------------------------------------------------------------------

 The following files are created:

 Force constants in a human-readable format : al2o3_cubic.fcs
 Input data for the phonon code ANPHON      : al2o3_cubic.xml

 Job finished at Thu Nov 25 23:33:02 2021

You may need to wait longer depending on the performance of the CPU chip. The long execution time may be shortened by using another sparse solver, such as Pardiso, instead of Eigen, but I have not implemented it yet.

abhirup86 commented 2 years ago

Thanks, will it be possible for you to share those two files here? I do not have access to Xenon nodes right now. I will try on some other big memory nodes.

Thanks, Abhirup

ttadano commented 2 years ago

I've sent the files to you directly via email.

abhirup86 commented 2 years ago

Thank you so much.