Closed yoshiurr-INL closed 7 months ago
Hm, if you change the line in the dependencies.xml from:
<tensorflow source="pip" os='mac,linux'>2.10</tensorflow>
to
<tensorflow os='mac,linux'>2.10</tensorflow>
does it install?
(Note that we do not currently have automated testing on arm64)
@joshua-cogliati-inl Joshua, I found the identical issue on my M1 MacBook Pro 13 inch (OS: Ventura 13.5; Processor: Apple M1), just like Ramon experienced.
I tried to edit the dependencies.xml
as you suggested, and the conda environment can be established by ./scripts/establish_conda_env.sh --install
.
However, after ./build_raven
and ./run_tests -j4
, 23 tests are marked as "Diff" or "Failed". See the attached log file.
Haoyu log_run_test_j4_20230802.log
Hm, if you change the line in the dependencies.xml from:
<tensorflow source="pip" os='mac,linux'>2.10</tensorflow>
to<tensorflow os='mac,linux'>2.10</tensorflow>
does it install?(Note that we do not currently have automated testing on arm64)
Okay, so we can install it if we switch tensorflow back to conda-forge, but it fails some tests. I think the correct solution for this is probably to switch to a newer version of tensorflow.
Thanks Joshua. Let me know if you have any candidate versions in your mind. I can test on my M1 machine (it's idle recently)
Okay, so we can install it if we switch tensorflow back to conda-forge, but it fails some tests. I think the correct solution for this is probably to switch to a newer version of tensorflow.
Tensorflow 2.12 and 2.13 might be worth trying.
I started testing tensorflow 2.12 in https://github.com/idaholab/raven/pull/2138 but we need a few updates for it.
@joshua-cogliati-inl, here are the results:
Using 2.12 (I modified Line 49 of dependencies.xml
to <tensorflow os='mac,linux'>2.12</tensorflow>
: Can establish conda environment, but has 14 Failed tests and 16 Diff tests, see log below;
log_run_test_j4_tensorflow_2_12_2023AUG03.log
Using 2.13 (Only available through PIP channel, I modified Line 49 of dependencies.xml
to <tensorflow source="pip" os='mac,linux'>2.13</tensorflow>
: Can establish conda environment, but has 673 Failed tests, see log below;
log_run_test_j4_tensorflow_2_13_2023AUG03.log
Tensorflow 2.12 and 2.13 might be worth trying.
Hm, for 2.13, something is being done incorrectly:
ImportError: Failed to import grpc on Apple Silicon. On Apple Silicon machines, try `pip uninstall grpcio; conda install grpcio`. Check out https://docs.ray.io/en/master/ray-overview/installation.html#m1-mac-apple-silicon-support for more details.
Is there anything we can do within raven's establish_conda_env.sh
script?
Hm, for 2.13, something is being done incorrectly:
ImportError: Failed to import grpc on Apple Silicon. On Apple Silicon machines, try `pip uninstall grpcio; conda install grpcio`. Check out https://docs.ray.io/en/master/ray-overview/installation.html#m1-mac-apple-silicon-support for more details.
It might be worth adding 'grpcio' as a conda dependency and see if that solves it.
Otherwise, yes, we might need to modify establish_conda_env.sh
I added the <grpcio/>
to dependencies.xml
, and the conda environment can be established, but 14 failed and 16 diff tests. See the dependencies.xml
and log attached.
dependencies_and_log_2023AUG04.zip
It might be worth adding 'grpcio' as a conda dependency and see if that solves it.
I added the
to dependencies.xml, and the conda environment can be established, but 14 failed and 16 diff tests. See the dependencies.xml and log attached.
It looks like a bunch of the diff and failed are because of the tensorflow update. So that is probably the first thing that we need to fix.
Joshua, let me know when you need to test the fix. I can do the test on M1 chip.
For future reference, these are the changes made to dependencies.xml compared to current devel (scipy is actually updated by a devel change, so we probably do not need to downgrade scipy, also smt was added in devel as well):
--- dependencies.xml 2023-08-28 10:20:41.567497521 -0600
+++ /tmp/.fr-NTKHA2/dependencies.xml 2023-08-04 08:39:21.000000000 -0600
@@ -37,7 +37,7 @@
<main>
<h5py/>
<numpy>1.22</numpy>
- <scipy>1.9</scipy>
+ <scipy>1.7</scipy>
<scikit-learn>1.0</scikit-learn>
<pandas/>
<!-- Note most versions of xarray work, but some (such as 0.20) don't -->
@@ -46,8 +46,9 @@
<matplotlib>3.5</matplotlib>
<statsmodels>0.13</statsmodels>
<cloudpickle>2.2</cloudpickle>
- <tensorflow source="pip" os='mac,linux'>2.10</tensorflow>
- <tensorflow source="pip" os='windows'>2.10</tensorflow>
+ <tensorflow source="pip" os='mac,linux'>2.13</tensorflow>
+ <tensorflow source="pip" os='windows'>2.13</tensorflow>
+ <grpcio/>
<!-- conda is really slow on windows if the version is not specified.-->
<python skip_check='True' os='windows'>3.8</python>
<python skip_check='True' os='mac,linux'>3</python>
@@ -70,7 +71,6 @@
<!-- redis is needed by ray, but on windows, this seems to need to be explicitly stated -->
<redis source="pip" os='windows'/>
<imageio source="pip">2.22</imageio>
- <smt/>
<line_profiler optional='True'/>
<!-- <ete3 optional='True'/> -->
<pywavelets optional='True'>1.1</pywavelets>
Joshua, is this dependencies.xml
in any branch? I can give it a try if you can point me to the correct branch.
Joshua, is this
dependencies.xml
in any branch? I can give it a try if you can point me to the correct branch.
I just used the dependencies.xml file you included in your zip file, and I also just updated the https://github.com/idaholab/raven/pull/2138 with 2.13 instead of 2.12
Thanks, I will wait until #2138 gets merged and then test it on M1 chip.
Joshua, is this
dependencies.xml
in any branch? I can give it a try if you can point me to the correct branch.
It is on my joshua-cogliati-inl:tensorflow_212 branch that #2138 uses, it would be useful to know if it fixes things on the M1 chip.
It is on my joshua-cogliati-inl:tensorflow_212 branch that #2138 uses, it would be useful to know if it fixes things on the M1 chip.
Thanks Joshua, Let me give it a try on M1 chip tonight or tomorrow. I will attach the log file here.
FYI: If anyone uses the diff for the dependencies.xml, do not remove smt
since that will cause newer versions of RAVEN to fail.
On further investigation, smt
does not seem to be available for macos amd64: https://pypi.org/project/smt/#files
so we probably do need to change <smt/>
to <smt optional='True'/>
and put imports that use smt into try catch blocks.
FYI: If anyone uses the diff for the dependencies.xml, do not remove
smt
since that will cause newer versions of RAVEN to fail.
Josh, you were correct. I deleted <smt/>
in the attached dependencies_a.xml
and 694 tests failed on M1 chip. See attached Log_Sep05_2023_a.log
.
So I re-added <smt source='pip'/>
in the attached dependencies_b.xml
and it runs better. 19 tests failed. See attached Log_Sep05_2023_b.log
.
Sep_5_2022_Trials.zip
Some errors I saw:
File ".../raven/ravenframework/Optimizers/acquisitionFunctions/AcquisitionFunction.py", line 138, in conductAcquisition res = sciopt.differential_evolution(optFunc, bounds=self._bounds, polish=self._polish, maxiter=self._maxiter, tol=self._tol,
TypeError: differential_evolution() got an unexpected keyword argument 'vectorized'
File ".../python3.10/site-packages/netCDF4/__init__.py", line 3, in <module> from ._netCDF4 import
ImportError: dlopen(.../python3.10/site-packages/netCDF4/_netCDF4.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace '_nc_close'
libc++abi: terminating due to uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::overflow_error>>: Error in function ibeta_derivative<e>(e,e,e): Overflow Error
Also, a bunch of diffs.
I think it is worth trying netcdf 1.6 to see if that fixes the netcdf errors. I think the floating point hardware must be a bit different and causing the overflow error and some of the diffs.
[like] Congjian Wang reacted to your message:
From: Joshua J. Cogliati @.> Sent: Thursday, September 7, 2023 5:00:13 PM To: idaholab/raven @.> Cc: Congjian Wang @.>; Assign @.> Subject: [EXTERNAL] Re: [idaholab/raven] [UNDER-DISCUSSION] Issue finding tensorflow during Install RAVEN libraries for Mac M2 (Issue #2158)
Some errors I saw:
File ".../raven/ravenframework/Optimizers/acquisitionFunctions/AcquisitionFunction.py", line 138, in conductAcquisition res = sciopt.differential_evolution(optFunc, bounds=self._bounds, polish=self._polish, maxiter=self._maxiter, tol=self._tol, TypeError: differential_evolution() got an unexpected keyword argument 'vectorized'
File ".../python3.10/site-packages/netCDF4/init.py", line 3, in
libc++abi: terminating due to uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector
Also, a bunch of diffs.
I think it is worth trying netcdf 1.6 to see if that fixes the netcdf errors. I think the floating point hardware must be a bit different and causing the overflow error and some of the diffs.
— Reply to this email directly, view it on GitHubhttps://github.com/idaholab/raven/issues/2158#issuecomment-1710496869, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABX3L36I2DCB67MEVPVZ5STXZH4R3ANCNFSM6AAAAAA22PJ3RE. You are receiving this because you were assigned.Message ID: @.***>
So apparently the remaining errors are:
FAILED:
Diff tests/framework/redundantInputs
Diff tests/framework/NDGridProbabilityWeightValue
Diff tests/framework/CodeInterfaceTests/CobraTF/test3
Diff tests/framework/pca_sparseGridCollocation/polyCorrelation
Diff tests/framework/PostProcessors/LimitSurface/testLimitSurfaceIntegralPPWithBoundingError
Diff tests/framework/Optimizers/GeneticAlgorithms/simionescuConstrainedInvLin
Diff tests/framework/Samplers/SparseGrid/normal
Failed tests/framework/Samplers/SparseGrid/betanorm
Failed tests/framework/Samplers/SparseGrid/beta
Diff tests/framework/Samplers/SparseGrid/triangular
Diff tests/framework/pca_adaptive_sgc/test_adaptive_sgc_poly_pca_analytic
PASSED: 778
SKIPPED: 93
FAILED: 11
I think a lot of those are from differences between how arm64 and amd64 handle floating point numbers. (From what I have seen online, I think basic arithmetic (+-*/) are the same, but things like floating to integer and back are different as well as functions like sin which will give differences eventually)
[like] Congjian Wang reacted to your message:
From: Joshua J. Cogliati @.> Sent: Monday, September 11, 2023 4:39:31 PM To: idaholab/raven @.> Cc: Congjian Wang @.>; Assign @.> Subject: [EXTERNAL] Re: [idaholab/raven] [UNDER-DISCUSSION] Issue finding tensorflow during Install RAVEN libraries for Mac M2 (Issue #2158)
So apparently the remaining errors are:
FAILED: Diff tests/framework/redundantInputs Diff tests/framework/NDGridProbabilityWeightValue Diff tests/framework/CodeInterfaceTests/CobraTF/test3 Diff tests/framework/pca_sparseGridCollocation/polyCorrelation Diff tests/framework/PostProcessors/LimitSurface/testLimitSurfaceIntegralPPWithBoundingError Diff tests/framework/Optimizers/GeneticAlgorithms/simionescuConstrainedInvLin Diff tests/framework/Samplers/SparseGrid/normal Failed tests/framework/Samplers/SparseGrid/betanorm Failed tests/framework/Samplers/SparseGrid/beta Diff tests/framework/Samplers/SparseGrid/triangular Diff tests/framework/pca_adaptive_sgc/test_adaptive_sgc_poly_pca_analytic
PASSED: 778 SKIPPED: 93 FAILED: 11
I think a lot of those are from differences between how arm64 and amd64 handle floating point numbers. (From what I have seen online, I think basic arithmetic (+-*/) are the same, but things like floating to integer and back are different as well as functions like sin which will give differences eventually)
— Reply to this email directly, view it on GitHubhttps://github.com/idaholab/raven/issues/2158#issuecomment-1714232402, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABX3L33CMZXKCOWWLZH5F2LXZ45EHANCNFSM6AAAAAA22PJ3RE. You are receiving this because you were assigned.Message ID: @.***>
Just FYI: (on M2, I had to download and "pip install" smt directly from https://github.com/SMTorg/SMT)
@alfoa Yes, we are discussing smt at: https://github.com/idaholab/raven/pull/2138#discussion_r1337680697
This issue is partly addressed by PR #2138
It looks like #2201 fixed the beta Sampler problems:
(49/69) Success( 2.87sec)tests/framework/Samplers/SparseGrid/beta
(50/69) Success( 2.90sec)tests/framework/Samplers/SparseGrid/betanorm
Update: And for that matter all the RAVEN tests currently pass on Mac OS amd64:
PASSED: 794
SKIPPED: 95
FAILED: 0
... RAVEN tests passed successfully.
[like] Congjian Wang reacted to your message:
From: Joshua J. Cogliati @.> Sent: Friday, November 10, 2023 5:57:25 PM To: idaholab/raven @.> Cc: Congjian Wang @.>; Assign @.> Subject: [EXTERNAL] Re: [idaholab/raven] [TASK] Issue finding tensorflow during Install RAVEN libraries for Mac M2 (Issue #2158)
It looks like #2201https://github.com/idaholab/raven/pull/2201 fixed the beta Sampler problems:
(49/69) Success( 2.87sec)tests/framework/Samplers/SparseGrid/beta (50/69) Success( 2.90sec)tests/framework/Samplers/SparseGrid/betanorm
— Reply to this email directly, view it on GitHubhttps://github.com/idaholab/raven/issues/2158#issuecomment-1806175780, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABX3L3Y6CMJ4Q6JBEYHAGNLYDZTILAVCNFSM6AAAAAA22PJ3RGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWGE3TKNZYGA. You are receiving this because you were assigned.Message ID: @.***>
It seems this issue has been resolved.
Under Discussion Topic
Machine Specification Equipment: MacBook Pro OS: Ventura 13.5 Processor: Apple M2 Max
Summary of the topic to be discussed with the development team While installing RAVEN libraries using "--install", the pip install for tensorflow cannot find a version that satisfies the requirements of tensorflow==2.10.*
When trying to use "--mamba" instead, the installation process does not start.
Describe the solution you'd like to be implemented Identify whether this issue is common for Mac systems. Identify whether this issue is common for M1 and M2 chips.
Describe alternatives you've considered Maybe conda installing tensorflow?
For Change Control Board: Issue Review
This review should occur before any development is performed as a response to this issue.
For Change Control Board: Issue Closure
This review should occur when the issue is imminently going to be closed.