visit-dav / visit

VisIt - Visualization and Data Analysis for Mesh-based Scientific Data
https://visit.llnl.gov
BSD 3-Clause "New" or "Revised" License
438 stars 116 forks source link

VisIt cluster build, AMD and intel/2019 #12182

Closed griffin28 closed 3 years ago

griffin28 commented 4 years ago

We're trying to combine visit-users email list with GitHub issues. When replying on visit-users, please reply only to emails with a Subject line that includes [visit-dav/live-customer-response].


Greetings,

Our admins are having trouble installing visit on a new cluster 'Betzy' https://www.top500.org/system/179861/.

EasyBuild failed to build through the intel/2019 toolchain with message:

libtool: compile: mpiifort -I. -I../../src -I../../fortran/src -O3 -I../../fortran/src -I../../fortran/src -O2 -xHost -ftz -fp-speculation=safe -fp-model source -fPIC -c tf_gen.F90 -fPIC -o .libs/tf_gen.o tf_gen.F90(172): catastrophic error: Function return parameter requires SSE register while SSE is disabled.

To debug, our admins tried upgrading VisIt on 'Fram' https://www.top500.org/system/179072/ where we have 2.13 installed via the intel/2017 toolchain. However, installation with the intel/2019 toolchain also failed.

Any suggestions to things we can try are much appreciated….

Cheers, Robert Marskar SINTEF Energy Research

griffin28 commented 4 years ago

Hello Robert...this looks like an issue down at the kernel level. It looks like the Streaming SIMD Extensions aren't supported which I believe affects floating point operations in the kernel. Have you tried looking at adding/removing CFLAGS like -mno-sse?

I'm not familiar with EasyBuild, does it leverage the build_visit scripts? Do you have any build logs you can share?

rmrsk commented 4 years ago

Hi Kevin - I don't have those as these things are handled by our admins, who have more knowledge about this than I do.

Is it OK if I put them in direct contact with you? Is this github issue a good arena for discussion, or would you rather communicate over email? If you want, we could also pick up the phone/zoom/skype...

griffin28 commented 4 years ago

@rmrsk ...my email is griffin28@llnl.gov. Send me an email and I'll setup a time for tomorrow. I will basically be looking to determine if there's something particular to VisIt that's causing the issue. Your admins will need to figure out any system specific issues. The buildvisit (https://github.com/visit-dav/visit/tree/develop/src/tools/dev/scripts) and bv*.sh (https://github.com/visit-dav/visit/tree/develop/src/tools/dev/scripts/bv_support) scripts may also be a good reference for your admins using EasyBuild.

griffin28 commented 4 years ago

Hi,

I have been trying for some time to install VisIt 3.1.2 using EasyBuild. In order to reduce the number of unknowns, I have recently moved the attempts of installing Visit from the new AMD cluster Betzy to our old and reliable Intel cluster Fram.

I have placed the EasyBuild scripts and the resulting build logs on the Internet for my last two attempts:

http://folk.ntnu.no/hrn/visit/intel-2018b/ http://folk.ntnu.no/hrn/visit/intel-2019b/

These two attempts give completely different error messages, although they are almost identical. However, notice that the Intel/2018b is too old to be used on the target computer Betzy, because the oldest toolchain that is supported on Betzy is Intel/2019a.

Best regards,

Henrik

griffin28 commented 4 years ago

Henrik...it looks like the build doesn't like the system python. Try having VisIt build its version of python. Remove --system-python and add --python and --openssl.

-- Kevin

griffin28 commented 4 years ago

New tests also failed: http://folk.ntnu.no/hrn/visit/2/

R

-----Original Message----- From: Henrik R. Nagel via RT support@metacenter.no Sent: torsdag 25. juni 2020 09:13 To: Robert Marskar Robert.Marskar@sintef.no Subject: [uninett.no #213754] Betzy, VisIt for visualisering

Hi,

I have recompiled VisIt and have new results in:

https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffolk.ntnu.no%2Fhrn%2Fvisit%2F2%2F&data=02%7C01%7CRobert.Marskar%40sintef.no%7C8c365708ce7e46ebb65d08d818d760f7%7Ce1f00f39604145b0b309e0210d8b32af%7C1%7C0%7C637286660610005950&sdata=6f%2FCEFAWClEoXpTLV%2BFRNeQnCm%2BA994CT9DqslpTkRU%3D&reserved=0

Best regards,

Henrik

griffin28 commented 4 years ago

@cyrush Looks like the openssl update is biting us again.

Ok...looks like you're getting closer. The Openssl version was updated and I've ran into this issue also. Attached is a file (bv_openssl.sh) that will download the previous openssl version. Can you replace your current bv_openssll.sh file located under src/tools/dev/scripts/bv_support with this one? It downloads the previous openssl version that should be compatible with your system.

-- Kevin

cyrush commented 4 years ago

@griffin28 yes, there it is again. Some how we must be using two versions of openssl. We are trying to use a newer version but some part of the change is linked to an older system version.

For the intel 19 case, looking deeper, that case has a different openssl issue.

There is an openssl related failure in build of cmake itself. We don't actually pass the build_visit version of openssl lib to cmake.

griffin28 commented 4 years ago

@cyrush I wonder if setting -DCMAKE_USE_OPENSSL=1 is causing the issue in the intel 19 case? I don't set this variable explicitly during my builds.

cyrush commented 4 years ago

@griffin28 could be, but I think the default is on (should be equiv to 1). It would be good to know why that flag was provided.

griffin28 commented 4 years ago

Hi,

I added the file to the VisIt tarball and I patched the build-script with the information about the older version of OpenSSL. These are the results:

https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffolk.ntnu.no%2Fhrn%2Fvisit%2F3%2F&data=02%7C01%7CRobert.Marskar%40sintef.no%7C7b2cb1389e384e8adab408d819c93909%7Ce1f00f39604145b0b309e0210d8b32af%7C1%7C0%7C637287699322012991&sdata=GQnnpZLB%2FtMC%2FF8sjGr0s2RLZ4YnMM%2BC8vXzlrbEyPE%3D&reserved=0

Best regards,

Henrik

griffin28 commented 4 years ago

For the intel-2018b the Meta-Object Compiler doesn't appear to be installed. This is probably due to your use of --system-qt instead of having visit build qt for you. I'm not sure what your system QT has but you may want to check that you have the "development tools" installed to include /bin/moc. Alternatively, try removing --system-qt and adding --qt.

For the intel-2019 cases there's an issue with cmake and openssl. Not sure what's causing it but you may want to set the -DCMAKE_USE_OPENSSL variable to 0. Is there a reason why this is being set? I'm looking more into this one and will let you know if I come up with anything new.

griffin28 commented 4 years ago

The links below may help provide insight for your intel-2019a build issues. Also, instead of reverting to an earlier version of openssl, try using the latest for this build. In the build_visit log file the relevant details are above the line: make[2]: *** [Utilities/cmcurl/LIBCURL] Error 1

https://github.com/curl/curl/issues/1128

https://github.com/curl/curl/issues/860