m-a-d-n-e-s-s / madness

Multiresolution Adaptive Numerical Environment for Scientific Simulation
GNU General Public License v2.0
181 stars 62 forks source link

hungq for nacl input case #114

Closed naromero77 closed 2 years ago

naromero77 commented 9 years ago

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

jeffhammond commented 9 years ago

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero notifications@github.com wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

robertjharrison commented 9 years ago

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond notifications@github.com wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com> wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018 .

Robert J. Harrison tel: 865-272-9262

naromero77 commented 9 years ago

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison notifications@github.com wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018> .

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889 .

Sent from Gmail Mobile

robertjharrison commented 9 years ago

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero <notifications@github.com

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison notifications@github.com wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889> .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771 .

Robert J. Harrison tel: 865-272-9262

naromero77 commented 9 years ago

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison notifications@github.com wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771> .

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645 .

Sent from Gmail Mobile

lratcliff commented 9 years ago

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison notifications@github.com wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771> .

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645 .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHubhttps://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519.

naromero77 commented 9 years ago

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645> .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub< https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200 .

Sent from Gmail Mobile

lratcliff commented 9 years ago

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645> .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub< https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200 .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHubhttps://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434.

naromero77 commented 9 years ago

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200> .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub< https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988 .

Sent from Gmail Mobile

lratcliff commented 9 years ago

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200> .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub< https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988 .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHubhttps://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562.

naromero77 commented 9 years ago

I had a quick look just now, the only major difference is that you use libxc and I don't. There are other very minor differences which I don't think could be the source of the issue.

I think the next step is for me to try to use libxc first. I should also check the efix version on Vesta vs. Mira, there maybe a difference there as well.

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');>');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988> .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub< https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562

.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74299172 .

Sent from Gmail Mobile

naromero77 commented 9 years ago

Laura,

While I am check on some things, can you try another test? Instead of using the debug version of TBB, try using the release version of it. (change "_debug" to "_release" in the path name).

On Fri, Feb 13, 2015 at 1:19 PM, Nichols A. Romero naromero@gmail.com wrote:

I had a quick look just now, the only major difference is that you use libxc and I don't. There are other very minor differences which I don't think could be the source of the issue.

I think the next step is for me to try to use libxc first. I should also check the efix version on Vesta vs. Mira, there maybe a difference there as well.

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub< https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562

.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74299172 .

Sent from Gmail Mobile

Nichols A. Romero, Ph.D.

lratcliff commented 9 years ago

With the release version I still don't get a hang, but it is faster - 304s, so still slower than without tbb but not as bad.

It could be that there's something Vesta specific - last time I tried I couldn't even get MADNESS to compile on Vesta using my usual config script, although I can't remember what the issue was.


From: Nichols A. Romero [notifications@github.com] Sent: Friday, February 13, 2015 1:29 PM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

While I am check on some things, can you try another test? Instead of using the debug version of TBB, try using the release version of it. (change "_debug" to "_release" in the path name).

On Fri, Feb 13, 2015 at 1:19 PM, Nichols A. Romero naromero@gmail.com wrote:

I had a quick look just now, the only major difference is that you use libxc and I don't. There are other very minor differences which I don't think could be the source of the issue.

I think the next step is for me to try to use libxc first. I should also check the efix version on Vesta vs. Mira, there maybe a difference there as well.

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub< https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562

.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74299172 .

Sent from Gmail Mobile

Nichols A. Romero, Ph.D.

— Reply to this email directly or view it on GitHubhttps://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74311778.

jeffhammond commented 9 years ago

No surprise here. No TBB on BGQ has lower overhead and knows eg that yield() is a no-op.

Jeff

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

With the release version I still don't get a hang, but it is faster - 304s, so still slower than without tbb but not as bad.

It could be that there's something Vesta specific - last time I tried I couldn't even get MADNESS to compile on Vesta using my usual config script, although I can't remember what the issue was.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 1:29 PM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

While I am check on some things, can you try another test? Instead of using the debug version of TBB, try using the release version of it. (change "_debug" to "_release" in the path name).

On Fri, Feb 13, 2015 at 1:19 PM, Nichols A. Romero <naromero@gmail.com javascript:_e(%7B%7D,'cvml','naromero@gmail.com');> wrote:

I had a quick look just now, the only major difference is that you use libxc and I don't. There are other very minor differences which I don't think could be the source of the issue.

I think the next step is for me to try to use libxc first. I should also check the efix version on Vesta vs. Mira, there maybe a difference there as well.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');>');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74299172> .

Sent from Gmail Mobile

Nichols A. Romero, Ph.D.

— Reply to this email directly or view it on GitHub< https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74311778

.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74315519 .

Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

naromero77 commented 9 years ago

I think that I have figured out the problem and I am testing my idea.

My source of the issue is probably gperftools 2.2.1. The default behavior of configure is that if the gperftools profiler exists, we link it in along with tcmalloc. However, on BG/Q, the profiler never gave correct results. I guess it also causes wird hangs.

I am manually overriding this behavior by removing the profiler library from the gperftools directory.

I will retest shortly.

On Fri, Feb 13, 2015 at 2:10 PM, Jeff Hammond notifications@github.com wrote:

No surprise here. No TBB on BGQ has lower overhead and knows eg that yield() is a no-op.

Jeff

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

With the release version I still don't get a hang, but it is faster - 304s, so still slower than without tbb but not as bad.

It could be that there's something Vesta specific - last time I tried I couldn't even get MADNESS to compile on Vesta using my usual config script, although I can't remember what the issue was.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 1:29 PM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

While I am check on some things, can you try another test? Instead of using the debug version of TBB, try using the release version of it. (change "_debug" to "_release" in the path name).

On Fri, Feb 13, 2015 at 1:19 PM, Nichols A. Romero <naromero@gmail.com javascript:_e(%7B%7D,'cvml','naromero@gmail.com');> wrote:

I had a quick look just now, the only major difference is that you use libxc and I don't. There are other very minor differences which I don't think could be the source of the issue.

I think the next step is for me to try to use libxc first. I should also check the efix version on Vesta vs. Mira, there maybe a difference there as well.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');>');>

http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74299172>

.

Sent from Gmail Mobile

Nichols A. Romero, Ph.D.

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74311778

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74315519> .

Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74318516 .

Nichols A. Romero, Ph.D.

naromero77 commented 9 years ago

Unfortunately, this was not the problem. Trying with libxc next.

On Fri, Feb 13, 2015 at 2:25 PM, Nichols A. Romero naromero@gmail.com wrote:

I think that I have figured out the problem and I am testing my idea.

My source of the issue is probably gperftools 2.2.1. The default behavior of configure is that if the gperftools profiler exists, we link it in along with tcmalloc. However, on BG/Q, the profiler never gave correct results. I guess it also causes wird hangs.

I am manually overriding this behavior by removing the profiler library from the gperftools directory.

I will retest shortly.

On Fri, Feb 13, 2015 at 2:10 PM, Jeff Hammond notifications@github.com wrote:

No surprise here. No TBB on BGQ has lower overhead and knows eg that yield() is a no-op.

Jeff

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

With the release version I still don't get a hang, but it is faster - 304s, so still slower than without tbb but not as bad.

It could be that there's something Vesta specific - last time I tried I couldn't even get MADNESS to compile on Vesta using my usual config script, although I can't remember what the issue was.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 1:29 PM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

While I am check on some things, can you try another test? Instead of using the debug version of TBB, try using the release version of it. (change "_debug" to "_release" in the path name).

On Fri, Feb 13, 2015 at 1:19 PM, Nichols A. Romero <naromero@gmail.com javascript:_e(%7B%7D,'cvml','naromero@gmail.com');> wrote:

I had a quick look just now, the only major difference is that you use libxc and I don't. There are other very minor differences which I don't think could be the source of the issue.

I think the next step is for me to try to use libxc first. I should also check the efix version on Vesta vs. Mira, there maybe a difference there as well.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com ');>');>');>');>');>

http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74299172

.

Sent from Gmail Mobile

Nichols A. Romero, Ph.D.

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74311778

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74315519

.

Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74318516 .

Nichols A. Romero, Ph.D.

Nichols A. Romero, Ph.D.

naromero77 commented 9 years ago

Switching to libxc seemed to fix the problem. So the question is, did it really fix the problem or are we just masking the problem?

On Fri, Feb 13, 2015 at 4:30 PM, Nichols A. Romero naromero@gmail.com wrote:

Unfortunately, this was not the problem. Trying with libxc next.

On Fri, Feb 13, 2015 at 2:25 PM, Nichols A. Romero naromero@gmail.com wrote:

I think that I have figured out the problem and I am testing my idea.

My source of the issue is probably gperftools 2.2.1. The default behavior of configure is that if the gperftools profiler exists, we link it in along with tcmalloc. However, on BG/Q, the profiler never gave correct results. I guess it also causes wird hangs.

I am manually overriding this behavior by removing the profiler library from the gperftools directory.

I will retest shortly.

On Fri, Feb 13, 2015 at 2:10 PM, Jeff Hammond notifications@github.com wrote:

No surprise here. No TBB on BGQ has lower overhead and knows eg that yield() is a no-op.

Jeff

On Friday, February 13, 2015, lratcliff notifications@github.com wrote:

With the release version I still don't get a hang, but it is faster - 304s, so still slower than without tbb but not as bad.

It could be that there's something Vesta specific - last time I tried I couldn't even get MADNESS to compile on Vesta using my usual config script, although I can't remember what the issue was.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 1:29 PM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

While I am check on some things, can you try another test? Instead of using the debug version of TBB, try using the release version of it. (change "_debug" to "_release" in the path name).

On Fri, Feb 13, 2015 at 1:19 PM, Nichols A. Romero <naromero@gmail.com javascript:_e(%7B%7D,'cvml','naromero@gmail.com');> wrote:

I had a quick look just now, the only major difference is that you use libxc and I don't. There are other very minor differences which I don't think could be the source of the issue.

I think the next step is for me to try to use libxc first. I should also check the efix version on Vesta vs. Mira, there maybe a difference there as well.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com <javascript:_e(%7B%7D,'cvml','notifications@github.com ');>');>');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub <https://github.com/m-a-d-n-e-s-s/madness/issues/114 .

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com ');>');>');>');>');>

http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74299172

.

Sent from Gmail Mobile

Nichols A. Romero, Ph.D.

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74311778

.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74315519

.

Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74318516 .

Nichols A. Romero, Ph.D.

Nichols A. Romero, Ph.D.

Nichols A. Romero, Ph.D.

robertjharrison commented 9 years ago

it is well documented that the builtin lda library is broken ... i can easily reproduce it

rerun with libxc

On Fri, Feb 13, 2015 at 10:03 AM, Nichols A. Romero < notifications@github.com> wrote:

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison notifications@github.com wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889>

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771>

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645> .

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519 .

Robert J. Harrison tel: 865-272-9262

naromero77 commented 9 years ago

Sorry, I was unaware. A long time ago, the XL compiler gave a similar issue but the nan's were really random. Just breathing on the code would make it go away. I ended up giving the middle finger to the XL compiler at this point and never used XL again.

I think we should document this nan issue with the builtin XL on the external wiki. I can do some documentation clean-up on our wiki, while I am on paternity leave.

naromero77 commented 9 years ago

I am going to close this issue unless I hear objections by the end of the day.

justusc commented 9 years ago

Perhaps open a new issue on madness lda?

jeffhammond commented 9 years ago

As I understand it, not using libxc causes numerical issues that cause madness to refine to infinite precision, which presumably exceeds the finite capacity of the task queue.

That we get a hungq error rather than silent hang is good enough for me. But we might need slightly better error detection here plus other forms of mitigation.

Jeff

On Friday, February 13, 2015, Nichols A. Romero notifications@github.com wrote:

Switching to libxc seemed to fix the problem. So the question is, did it really fix the problem or are we just masking the problem?

On Fri, Feb 13, 2015 at 4:30 PM, Nichols A. Romero <naromero@gmail.com javascript:_e(%7B%7D,'cvml','naromero@gmail.com');> wrote:

Unfortunately, this was not the problem. Trying with libxc next.

On Fri, Feb 13, 2015 at 2:25 PM, Nichols A. Romero <naromero@gmail.com javascript:_e(%7B%7D,'cvml','naromero@gmail.com');> wrote:

I think that I have figured out the problem and I am testing my idea.

My source of the issue is probably gperftools 2.2.1. The default behavior of configure is that if the gperftools profiler exists, we link it in along with tcmalloc. However, on BG/Q, the profiler never gave correct results. I guess it also causes wird hangs.

I am manually overriding this behavior by removing the profiler library from the gperftools directory.

I will retest shortly.

On Fri, Feb 13, 2015 at 2:10 PM, Jeff Hammond <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

No surprise here. No TBB on BGQ has lower overhead and knows eg that yield() is a no-op.

Jeff

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

With the release version I still don't get a hang, but it is faster - 304s, so still slower than without tbb but not as bad.

It could be that there's something Vesta specific - last time I tried I couldn't even get MADNESS to compile on Vesta using my usual config script, although I can't remember what the issue was.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 1:29 PM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

While I am check on some things, can you try another test? Instead of using the debug version of TBB, try using the release version of it. (change "_debug" to "_release" in the path name).

On Fri, Feb 13, 2015 at 1:19 PM, Nichols A. Romero < naromero@gmail.com javascript:_e(%7B%7D,'cvml','naromero@gmail.com'); <javascript:_e(%7B%7D,'cvml','naromero@gmail.com javascript:_e(%7B%7D,'cvml','naromero@gmail.com');');>> wrote:

I had a quick look just now, the only major difference is that you use libxc and I don't. There are other very minor differences which I don't think could be the source of the issue.

I think the next step is for me to try to use libxc first. I should also check the efix version on Vesta vs. Mira, there maybe a difference there as well.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I put the output files and my config script in /gpfs/mira-fs1/projects/DAPPX/hungq so you can take a look directly. The configure is just an updated version of something you originally gave me.

mpicc -show gives: bgxlc_r -I/bgsys/drivers/V1R2M2/ppc64/comm/include -I/bgsys/drivers/V1R2M2/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M2/ppc64 -I/bgsys/drivers/V1R2M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include -I/bgsys/drivers/V1R2M2/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M2/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -L/bgsys/drivers/V1R2M2/ppc64/spi/lib -I/bgsys/drivers/V1R2M2/ppc64/comm/include -L/bgsys/drivers/V1R2M2/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>] Sent: Friday, February 13, 2015 11:57 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Can we walk through the details?

What is your version of the compiler ? (Also output of mpicc -show)

Version of Elemental?

Version of tcmalloc?

Lastly, I would like to look at the cobaltlog for that run.

On Friday, February 13, 2015, lratcliff <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I just tried with Jeff's version and still no hang. Also, it turns out using TBB is slower than not for this example - 322s with vs 273s without. I don't remember seeing differences like this before, but maybe this is just a really unfortunate test case.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>] Sent: Friday, February 13, 2015 11:02 AM To: m-a-d-n-e-s-s/madness Cc: Ratcliff, Laura E. Subject: Re: [madness] hungq for nacl input case (#114)

Laura,

I am using a more recent version of TBB. You are probably using the one from 2013 with patch. There is another version in Vesta which Jeff patched. It's called something like: Tbb_jhammond

Can you try that one?

On Friday, February 13, 2015, lratcliff < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>> wrote:

I've tried with 3 relatively recent versions of MADNESS including the latest, and I couldn't reproduce the problem. I'm assuming we're using the same TBB so I guess it's not that. Maybe Robert's right about LDA being the cause? I am using libxc...

I did see a couple of hung queues over Christmas, but when I just tried to reproduce it I no longer have the same problem. So either it's the same problem and it only appears sporadically, or it's different to Nick's problem and it's now been resolved.


From: Nichols A. Romero [notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>] Sent: Friday, February 13, 2015 9:03 AM To: m-a-d-n-e-s-s/madness Subject: Re: [madness] hungq for nacl input case (#114)

Could someone please try to reproduce? I am starting to think it might have to do with my TBB version.

Laura, Can you try to reproduce with a recent version of MADNESS?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>> wrote:

The builtin library should be OK but is presently broken. I think the munging that fixed libxc was not propagated, or propagated incorrectly (presumably by me) into the builtin code.

On Fri, Feb 13, 2015 at 9:12 AM, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>');>

wrote:

I am not using libxc.

Is the builtin MADNESS lda functional ok to use?

On Friday, February 13, 2015, robertjharrison < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>');>');>');>> wrote:

Are you using libxc?

Recall that the old lda functional is presently broken and generates the occasional nan.

On Fri, Feb 13, 2015 at 1:14 AM, Jeff Hammond < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> ');>');>');>');>> wrote:

git bisect may be useful here...

On Thursday, February 12, 2015, Nichols A. Romero < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');> ');>');>');>');>>

wrote:

I get a hungq with 1 MPI task and 8 threads on BG/Q. I am using TBB, tcmalloc, and Elemental. It looks like the Nov. 14 version of moldft worked, but later versions had issues. I am trying to narrow down the space of git versions that had this issue.

dft xc lda aobasis sto-3g save false canon maxiter 1 end

geometry Na 1.31092303931025942E+01 1.31092303931026493E+01 1.31092303931017025E+01 Cl 1.38689078546959657E+01 1.38689078546958307E+01 1.38689078546942923E+01 end

Could other's please try to reproduced? Thanks

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114 .

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');>');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com');');> <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); <javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); ');>');>');>');>');>

http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74210018

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74223889

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74258771

.

Robert J. Harrison tel: 865-272-9262

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74263645

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74266519

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74285200

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74287434

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74294988

.

Sent from Gmail Mobile

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74296562

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74299172

.

Sent from Gmail Mobile

Nichols A. Romero, Ph.D.

— Reply to this email directly or view it on GitHub<

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74311778

.

— Reply to this email directly or view it on GitHub <

https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74315519

.

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74318516> .

Nichols A. Romero, Ph.D.

Nichols A. Romero, Ph.D.

Nichols A. Romero, Ph.D.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-74340367 .

Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

naromero77 commented 9 years ago

This is the first time I have gotten a hungq. In the past, it would printout NaN for Exc and keep running, which on some level is equally as weird. I have created to issues. One is documentation, other is the actual fix.

naromero77 commented 9 years ago

The problem is still present. I am now running with libxc, it works at MAD_NUM_THREADS=8, but I get a hungq with MAD_NUM_THREADS=2.

entering apply !!MADNESS: Hung queue? !!MADNESS: Hung queue? !!MADNESS: Hung queue? !!MADNESS: Hung queue? !!MADNESS: Hung queue? !!MADNESS ERROR: Exception thrown in WorldTaskQueue::fence() with 26800 pending task(s) MadnessException : msg=ThreadPool::await() timeout : value=1 : line=1085 : function=await : filename='../../../../madness -git-naromero/src/madness/world/worldthread.h'

Laura,

Can you run with MAD_NUM_THREADS=2 and see if you get a hungq ?

justusc commented 9 years ago

When running with TBB, you need at least 3 threads. This limitation is due to the way we implemented multithreaded tasks.

jeffhammond commented 9 years ago

Then we need to check this and abort if necessary rather than give users rope with which to hang themselves.

Jeff

On Sat, Feb 21, 2015 at 12:14 PM, Justus Calvin notifications@github.com wrote:

When running with TBB, you need at least 3 threads. This limitation is due to the way we implemented multithreaded tasks.

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-75390454 .

Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

naromero77 commented 9 years ago

Somehow it works on my Linux box, but hangs on BG/Q. Let me try three threads on both next.

I agree with Jeff, we need to idiot proof it.

On Saturday, February 21, 2015, Jeff Hammond notifications@github.com wrote:

Then we need to check this and abort if necessary rather than give users rope with which to hang themselves.

Jeff

On Sat, Feb 21, 2015 at 12:14 PM, Justus Calvin <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

When running with TBB, you need at least 3 threads. This limitation is due to the way we implemented multithreaded tasks.

— Reply to this email directly or view it on GitHub < https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-75390454> .

Jeff Hammond jeff.science@gmail.com javascript:_e(%7B%7D,'cvml','jeff.science@gmail.com'); http://jeffhammond.github.io/

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-75393422 .

Sent from Gmail Mobile

justusc commented 9 years ago

@naromero77 and @jeffhammond

The issue with the number of threads has to do with the specific implementation of the systolic matrix task, which blocks on all threads while waiting for communication and prevents tasks from incoming active messages from running.

Since this is not a fundamental restriction of multithreaded tasks in world, I think the check should go somewhere in the systolic matrix code. @robertjharrison would you agree?

robertjharrison commented 9 years ago

That's not my understanding of the issue.

I thought we needed three threads (with multiple MPI processes) because the main thread could not work while waiting for completion. Hence we need

The issue of multithreaded systolic task was just the lack of a working barrier ... which I think is just a software defect we can fix. Presently it is avoided by forcing the systolic code to use just one thread if TBB is enabled.

On Sun, Feb 22, 2015 at 9:34 AM, Justus Calvin notifications@github.com wrote:

@naromero77 https://github.com/naromero77 and @jeffhammond https://github.com/jeffhammond

The issue with the number of threads has to do with the specific implementation of the systolic matrix task, which blocks on all threads while waiting for communication and prevents tasks from incoming active messages from running.

Since this is not a fundamental restriction of multithreaded tasks in world, I think the check should go somewhere in the systolic matrix code. @robertjharrison https://github.com/robertjharrison would you agree?

— Reply to this email directly or view it on GitHub https://github.com/m-a-d-n-e-s-s/madness/issues/114#issuecomment-75438068 .

Robert J. Harrison tel: 865-272-9262