BUG: Regression in C blas flags detection

GaetanLepage commented 11 months ago

Describe the issue:

The following tests fail with pytensor==2.18.0:

=========================== short test summary info ============================
FAILED tests/test_printing.py::test_debugprint - AssertionError: assert ['Composite{(...   └─ B', ...] == ['Composite{(...  ...
FAILED tests/link/c/test_op.py::test_ExternalCOp_c_code_cache_version - ValueError: too many values to unpack (expected 3)

Reproducable code example:

Not relevant

Error message:

>       assert [l.strip() for l in s.split("\n")] == [
            l.strip() for l in exp_res.split("\n")
        ]
E       AssertionError: assert ['Composite{(...   └─ B', ...] == ['Composite{(...   └─ B', ...]
E         At index 2 diff: '│  └─ Gemv{inplace} d={0: [0]} 2' != '│  └─ CGemv{inplace} d={0: [0]} 2'
E         Use -v to get more diff

tests/test_printing.py:301: AssertionError

------------------------

____________________ test_ExternalCOp_c_code_cache_version _____________________

    def test_ExternalCOp_c_code_cache_version():
        """Make sure the C cache versions produced by `ExternalCOp` don't depend on `hash` seeding."""

        with tempfile.NamedTemporaryFile(dir=".", suffix=".py") as tmp:
            tmp.write(externalcop_test_code.encode())
            tmp.seek(0)
            # modname = os.path.splitext(tmp.name)[0]
            modname = tmp.name
            out_1, err = get_hash(modname, seed=428)
            assert err is None
            out_2, err = get_hash(modname, seed=3849)
            assert err is None

>           hash_1, msg, _ = out_1.decode().split("\n")
E           ValueError: too many values to unpack (expected 3)

tests/link/c/test_op.py:230: ValueError

PyTensor version information:

Version 2.18.0

Context for the issue:

I am working on updating pytensor from 2.17.3 to 2.18.0 on nixpkgs.

ricardoV94 commented 11 months ago

I can't reproduce locally, and our CI seems to be working. Can you check the output of python -c "import pytensor; print(pytensor.config)" and maybe also the one you were getting with the previous version? (The diff should be enough)

GaetanLepage commented 11 months ago

Here is what I get:

``` WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions. floatX ({'float16', 'float32', 'float64'}) Doc: Default floating-point precision for python casts. Note: float16 support is experimental, use at your own risk. Value: float64 warn_float64 ({'pdb', 'ignore', 'warn', 'raise'}) Doc: Do an action when a tensor variable with float64 dtype is created. Value: ignore pickle_test_value (>) Doc: Dump test values while pickling model. If True, test values will be dumped with model. Value: True cast_policy ({'numpy+floatX', 'custom'}) Doc: Rules for implicit type casting Value: custom deterministic ({'more', 'default'}) Doc: If `more`, sometimes we will select some implementation that are more deterministic, but slower. Also see the dnn.conv.algo* flags to cover more cases. Value: default device (cpu) Doc: Default device for computations. only cpu is supported for now Value: cpu force_device (>) Doc: Raise an error if we can't use the specified device Value: False conv__assert_shape (>) Doc: If True, AbstractConv* ops will verify that user-provided shapes match the runtime shapes (debugging option, may slow down compilation) Value: False print_global_stats (>) Doc: Print some global statistics (time spent) at the end Value: False assert_no_cpu_op ({'pdb', 'ignore', 'warn', 'raise'}) Doc: Raise an error/warning if there is a CPU op in the computational graph. Value: ignore unpickle_function (>) Doc: Replace unpickled PyTensor functions with None. This is useful to unpickle old graphs that pickled them when it shouldn't Value: True Doc: Default compilation mode Value: Mode cxx () Doc: The C++ compiler to use. Currently only g++ is supported, but supporting additional compilers should not be too difficult. If it is empty, no C++ code is compiled. Value: /nix/store/90h6k8ylkgn81k10190v5c9ldyjpzgl9-gcc-wrapper-12.3.0/bin/g++ linker ({'c|py', 'c|py_nogc', 'cvm_nogc', 'c', 'vm', 'cvm', 'py', 'vm_nogc'}) Doc: Default linker used if the pytensor flags mode is Mode Value: cvm allow_gc (>) Doc: Do we default to delete intermediate results during PyTensor function calls? Doing so lowers the memory requirement, but asks that we reallocate memory at the next function call. This is implemented for the default linker, but may not work for all linkers. Value: True optimizer ({'o3', 'fast_run', 'fast_compile', 'unsafe', 'o4', 'o2', 'merge', 'None', 'o1'}) Doc: Default optimizer. If not None, will use this optimizer with the Mode Value: o4 optimizer_verbose (>) Doc: If True, we print all optimization being applied Value: False on_opt_error ({'raise', 'warn', 'ignore', 'pdb'}) Doc: What to do when an optimization crashes: warn and skip it, raise the exception, or fall into the pdb debugger. Value: warn nocleanup (>) Doc: Suppress the deletion of code files that did not compile cleanly Value: False on_unused_input ({'raise', 'warn', 'ignore'}) Doc: What to do if a variable in the 'inputs' list of pytensor.function() is not used in the graph. Value: raise gcc__cxxflags () Doc: Extra compiler flags for gcc Value: cmodule__warn_no_version (>) Doc: If True, will print a warning when compiling one or more Op with C code that can't be cached because there is no c_code_cache_version() function associated to at least one of those Ops. Value: False cmodule__remove_gxx_opt (>) Doc: If True, will remove the -O* parameter passed to g++.This is useful to debug in gdb modules compiled by PyTensor.The parameter -g is passed by default to g++ Value: False cmodule__compilation_warning (>) Doc: If True, will print compilation warnings. Value: False cmodule__preload_cache (>) Doc: If set to True, will preload the C module cache at import time Value: False cmodule__age_thresh_use () Doc: In seconds. The time after which PyTensor won't reuse a compile c module. Value: 2073600 cmodule__debug (>) Doc: If True, define a DEBUG macro (if not exists) for any compiled C code. Value: False compile__wait () Doc: Time to wait before retrying to acquire the compile lock. Value: 5 compile__timeout () Doc: In seconds, time that a process will wait before deciding to override an existing lock. An override only happens when the existing lock is held by the same owner *and* has not been 'refreshed' by this owner for more than this period. Refreshes are done every half timeout period for running processes. Value: 120 ctc__root () Doc: Directory which contains the root of Baidu CTC library. It is assumed that the compiled library is either inside the build, lib or lib64 subdirectory, and the header inside the include directory. Value: tensor__cmp_sloppy () Doc: Relax pytensor.tensor.math._allclose (0) not at all, (1) a bit, (2) more Value: 0 lib__amblibm (>) Doc: Use amd's amdlibm numerical library Value: False tensor__insert_inplace_optimizer_validate_nb () Doc: -1: auto, if graph have less then 500 nodes 1, else 10 Value: -1 traceback__limit () Doc: The number of stack to trace. -1 mean all. Value: 8 traceback__compile_limit () Doc: The number of stack to trace to keep during compilation. -1 mean all. If greater then 0, will also make us save PyTensor internal stack trace. Value: 0 warn__ignore_bug_before ({'0.3', '0.8', '0.5', '0.10', 'None', '0.9', '0.6', '0.4.1', '1.0.2', '0.8.1', '1.0', '0.8.2', '1.0.4', 'all', '1.0.1', '0.7', '1.0.5', '1.0.3', '0.4'}) Doc: If 'None', we warn about all PyTensor bugs found by default. If 'all', we don't warn about PyTensor bugs found by default. If a version, we print only the warnings relative to PyTensor bugs found after that version. Warning for specific bugs can be configured with specific [warn] flags. Value: 0.9 exception_verbosity ({'high', 'low'}) Doc: If 'low', the text of exceptions will generally refer to apply nodes with short names such as Elemwise{add_no_inplace}. If 'high', some exceptions will also refer to apply nodes with long descriptions like: A. Elemwise{add_no_inplace} B. log_likelihood_v_given_h C. log_likelihood_h Value: low print_test_value (>) Doc: If 'True', the __eval__ of an PyTensor variable will return its test_value when this is available. This has the practical consequence that, e.g., in debugging `my_var` will print the same as `my_var.tag.test_value` when a test value is defined. Value: False compute_test_value ({'off', 'ignore', 'warn', 'raise', 'pdb'}) Doc: If 'True', PyTensor will run each op at graph build time, using Constants, SharedVariables and the tag 'test_value' as inputs to the function. This helps the user track down problems in the graph before it gets optimized. Value: off compute_test_value_opt ({'off', 'ignore', 'warn', 'raise', 'pdb'}) Doc: For debugging PyTensor optimization only. Same as compute_test_value, but is used during PyTensor optimization Value: off check_input (>) Doc: Specify if types should check their input in their C code. It can be used to speed up compilation, reduce overhead (particularly for scalars) and reduce the number of generated C files. Value: True NanGuardMode__nan_is_error (>) Doc: Default value for nan_is_error Value: True NanGuardMode__inf_is_error (>) Doc: Default value for inf_is_error Value: True NanGuardMode__big_is_error (>) Doc: Default value for big_is_error Value: True NanGuardMode__action ({'raise', 'warn', 'pdb'}) Doc: What NanGuardMode does when it finds a problem Value: raise DebugMode__patience () Doc: Optimize graph this many times to detect inconsistency Value: 10 DebugMode__check_c (>) Doc: Run C implementations where possible Value: True DebugMode__check_py (>) Doc: Run Python implementations where possible Value: True DebugMode__check_finite (>) Doc: True -> complain about NaN/Inf results Value: True DebugMode__check_strides () Doc: Check that Python- and C-produced ndarrays have same strides. On difference: (0) - ignore, (1) warn, or (2) raise error Value: 0 DebugMode__warn_input_not_reused (>) Doc: Generate a warning when destroy_map or view_map says that an op works inplace, but the op did not reuse the input for its output. Value: True DebugMode__check_preallocated_output () Doc: Test thunks with pre-allocated memory as output storage. This is a list of strings separated by ":". Valid values are: "initial" (initial storage in storage map, happens with Scan),"previous" (previously-returned memory), "c_contiguous", "f_contiguous", "strided" (positive and negative strides), "wrong_size" (larger and smaller dimensions), and "ALL" (all of the above). Value: DebugMode__check_preallocated_output_ndim () Doc: When testing with "strided" preallocated output memory, test all combinations of strides over that number of (inner-most) dimensions. You may want to reduce that number to reduce memory or time usage, but it is advised to keep a minimum of 2. Value: 4 profiling__time_thunks (>) Doc: Time individual thunks when profiling Value: True profiling__n_apply () Doc: Number of Apply instances to print by default Value: 20 profiling__n_ops () Doc: Number of Ops to print by default Value: 20 profiling__output_line_width () Doc: Max line width for the profiling output Value: 512 profiling__min_memory_size () Doc: For the memory profile, do not print Apply nodes if the size of their outputs (in bytes) is lower than this threshold Value: 1024 profiling__min_peak_memory (>) Doc: The min peak memory usage of the order Value: False profiling__destination () Doc: File destination of the profiling output Value: stderr profiling__debugprint (>) Doc: Do a debugprint of the profiled functions Value: False profiling__ignore_first_call (>) Doc: Do we ignore the first call of an PyTensor function. Value: False on_shape_error ({'raise', 'warn'}) Doc: warn: print a warning and use the default value. raise: raise an error Value: warn openmp (>) Doc: Allow (or not) parallel computation on the CPU with OpenMP. This is the default value used when creating an Op that supports OpenMP parallelization. It is preferable to define it via the PyTensor configuration file ~/.pytensorrc or with the environment variable PYTENSOR_FLAGS. Parallelization is only done for some operations that implement it, and even for operations that implement parallelism, each operation is free to respect this flag or not. You can control the number of threads used with the environment variable OMP_NUM_THREADS. If it is set to 1, we disable openmp in PyTensor by default. Value: False openmp_elemwise_minsize () Doc: If OpenMP is enabled, this is the minimum size of vectors for which the openmp parallelization is enabled in element wise ops. Value: 200000 optimizer_excluding () Doc: When using the default mode, we will remove optimizer with these tags. Separate tags with ':'. Value: optimizer_including () Doc: When using the default mode, we will add optimizer with these tags. Separate tags with ':'. Value: optimizer_requiring () Doc: When using the default mode, we will require optimizer with these tags. Separate tags with ':'. Value: optdb__position_cutoff () Doc: Where to stop earlier during optimization. It represent the position of the optimizer where to stop. Value: inf optdb__max_use_ratio () Doc: A ratio that prevent infinite loop in EquilibriumGraphRewriter. Value: 8.0 cycle_detection ({'fast', 'regular'}) Doc: If cycle_detection is set to regular, most inplaces are allowed,but it is slower. If cycle_detection is set to faster, less inplacesare allowed, but it makes the compilation faster.The interaction of which one give the lower peak memory usage iscomplicated and not predictable, so if you are close to the peakmemory usage, triyng both could give you a small gain. Value: regular check_stack_trace ({'off', 'log', 'raise', 'warn'}) Doc: A flag for checking the stack trace during the optimization process. default (off): does not check the stack trace of any optimization log: inserts a dummy stack trace that identifies the optimizationthat inserted the variable that had an empty stack trace.warn: prints a warning if a stack trace is missing and also a dummystack trace is inserted that indicates which optimization insertedthe variable that had an empty stack trace.raise: raises an exception if a stack trace is missing Value: off metaopt__verbose () Doc: 0 for silent, 1 for only warnings, 2 for full output withtimings and selected implementation Value: 0 metaopt__optimizer_excluding () Doc: exclude optimizers with these tags. Separate tags with ':'. Value: metaopt__optimizer_including () Doc: include optimizers with these tags. Separate tags with ':'. Value: unittests__rseed () Doc: Seed to use for randomized unit tests. Special value 'random' means using a seed of None. Value: 666 warn__round (>) Doc: Warn when using `tensor.round` with the default mode. Round changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy. Value: False profile (>) Doc: If VM should collect profile information Value: False profile_optimizer (>) Doc: If VM should collect optimizer profile information Value: False profile_memory (>) Doc: If VM should collect memory profile information and print it Value: False Doc: Useful only for the VM Linkers. When lazy is None, auto detect if lazy evaluation is needed and use the appropriate version. If the C loop isn't being used and lazy is True, use the Stack VM; otherwise, use the Loop VM. Value: None numba__vectorize_target ({'cuda', 'parallel', 'cpu'}) Doc: Default target for numba.vectorize. Value: cpu numba__fastmath (>) Doc: If True, use Numba's fastmath mode. Value: True numba__cache (>) Doc: If True, use Numba's file based caching. Value: True compiledir_format () Doc: Format string for platform-dependent compiled module subdirectory (relative to base_compiledir). Available keys: device, gxx_version, hostname, numpy_version, platform, processor, pytensor_version, python_bitwidth, python_int_bitwidth, python_version, short_platform. Defaults to compiledir_%(short_platform)s-%(processor)s- %(python_version)s-%(python_bitwidth)s. Value: compiledir_%(short_platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s Doc: platform-independent root directory for compiled modules Value: /build/tmp.gHcmWT784l/.pytensor Doc: platform-dependent cache directory for compiled modules Value: /build/tmp.gHcmWT784l/.pytensor/compiledir_Linux-6.1.62-x86_64-with-glibc2.38--3.11.6-64 blas__ldflags () Doc: lib[s] to include for [Fortran] level-3 blas implementation Value: blas__check_openmp (>) Doc: Check for openmp library conflict. WARNING: Setting this to False leaves you open to wrong results in blas-related operations. Value: True scan__allow_gc (>) Doc: Allow/disallow gc inside of Scan (default: False) Value: False scan__allow_output_prealloc (>) Doc: Allow/disallow memory preallocation for outputs inside of scan (default: True) Value: True ```

ricardoV94 commented 11 months ago

How does it compare with before it started failing?

GaetanLepage commented 11 months ago

Sorry for the delay ! Here is the diff:

diff --git a/old.txt b/new.txt
index 624402e..325c0df 100644
--- a/old.txt
+++ b/new.txt
@@ -1,3 +1,4 @@
+WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
 floatX ({'float16', 'float32', 'float64'})
     Doc:  Default floating-point precision for python casts.

@@ -8,7 +9,7 @@ warn_float64 ({'pdb', 'ignore', 'warn', 'raise'})
     Doc:  Do an action when a tensor variable with float64 dtype is created.
     Value:  ignore

-pickle_test_value (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6f8e3290>>)
+pickle_test_value (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff747bc290>>)
     Doc:  Dump test values while pickling model. If True, test values will be dumped with model.
     Value:  True

@@ -24,15 +25,15 @@ device (cpu)
     Doc:  Default device for computations. only cpu is supported for now
     Value:  cpu

-force_device (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffff73e9150>>)
+force_device (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff73dcfb90>>)
     Doc:  Raise an error if we can't use the specified device
     Value:  False

-conv__assert_shape (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6f701090>>)
+conv__assert_shape (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394e650>>)
     Doc:  If True, AbstractConv* ops will verify that user-provided shapes match the runtime shapes (debugging option, may slow down compilation)
     Value:  False

-print_global_stats (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8fed90>>)
+print_global_stats (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff749d56d0>>)
     Doc:  Print some global statistics (time spent) at the end
     Value:  False

@@ -40,23 +41,23 @@ assert_no_cpu_op ({'pdb', 'ignore', 'warn', 'raise'})
     Doc:  Raise an error/warning if there is a CPU op in the computational graph.
     Value:  ignore

-unpickle_function (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8fee50>>)
+unpickle_function (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394e850>>)
     Doc:  Replace unpickled PyTensor functions with None. This is useful to unpickle old graphs that pickled them when it shouldn't
     Value:  True

-<pytensor.configparser.ConfigParam object at 0x7ffe6e8fef50>
+<pytensor.configparser.ConfigParam object at 0x7fff7394e8d0>
     Doc:  Default compilation mode
     Value:  Mode

 cxx (<class 'str'>)
     Doc:  The C++ compiler to use. Currently only g++ is supported, but supporting additional compilers should not be too difficult. If it is empty, no C++ code is compiled.
-    Value:  /nix/store/zlzz2z48s7ry0hkl55xiqp5a73b4mzrg-gcc-wrapper-12.3.0/bin/g++
+    Value:  /nix/store/90h6k8ylkgn81k10190v5c9ldyjpzgl9-gcc-wrapper-12.3.0/bin/g++

 linker ({'c|py', 'c|py_nogc', 'cvm_nogc', 'c', 'vm', 'cvm', 'py', 'vm_nogc'})
     Doc:  Default linker used if the pytensor flags mode is Mode
     Value:  cvm

-allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8ff250>>)
+allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394e990>>)
     Doc:  Do we default to delete intermediate results during PyTensor function calls? Doing so lowers the memory requirement, but asks that we reallocate memory at the next function call. This is implemented for the default linker, but may not work for all linkers.
     Value:  True

@@ -64,7 +65,7 @@ optimizer ({'o3', 'fast_run', 'fast_compile', 'unsafe', 'o4', 'o2', 'merge', 'No
     Doc:  Default optimizer. If not None, will use this optimizer with the Mode
     Value:  o4

-optimizer_verbose (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6edb2f10>>)
+optimizer_verbose (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7456ffd0>>)
     Doc:  If True, we print all optimization being applied
     Value:  False

@@ -72,7 +73,7 @@ on_opt_error ({'raise', 'warn', 'ignore', 'pdb'})
     Doc:  What to do when an optimization crashes: warn and skip it, raise the exception, or fall into the pdb debugger.
     Value:  warn

-nocleanup (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8ff290>>)
+nocleanup (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394e9d0>>)
     Doc:  Suppress the deletion of code files that did not compile cleanly
     Value:  False

@@ -84,19 +85,19 @@ gcc__cxxflags (<class 'str'>)
     Doc:  Extra compiler flags for gcc
     Value:

-cmodule__warn_no_version (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8ff390>>)
+cmodule__warn_no_version (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394eed0>>)
     Doc:  If True, will print a warning when compiling one or more Op with C code that can't be cached because there is no c_code_cache_version() function associated to at least one of those Ops.
     Value:  False

-cmodule__remove_gxx_opt (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8ff4d0>>)
+cmodule__remove_gxx_opt (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394ee90>>)
     Doc:  If True, will remove the -O* parameter passed to g++.This is useful to debug in gdb modules compiled by PyTensor.The parameter -g is passed by default to g++
     Value:  False

-cmodule__compilation_warning (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8ff590>>)
+cmodule__compilation_warning (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff73dcfe90>>)
     Doc:  If True, will print compilation warnings.
     Value:  False

-cmodule__preload_cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffff71c3190>>)
+cmodule__preload_cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394f090>>)
     Doc:  If set to True, will preload the C module cache at import time
     Value:  False

@@ -104,7 +105,7 @@ cmodule__age_thresh_use (<class 'int'>)
     Doc:  In seconds. The time after which PyTensor won't reuse a compile c module.
     Value:  2073600

-cmodule__debug (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6f6db850>>)
+cmodule__debug (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394f1d0>>)
     Doc:  If True, define a DEBUG macro (if not exists) for any compiled C code.
     Value:  False

@@ -128,7 +129,7 @@ tensor__cmp_sloppy (<class 'int'>)
     Doc:  Relax pytensor.tensor.math._allclose (0) not at all, (1) a bit, (2) more
     Value:  0

-lib__amblibm (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8ff790>>)
+lib__amblibm (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7479ef90>>)
     Doc:  Use amd's amdlibm numerical library
     Value:  False

@@ -155,7 +156,7 @@ exception_verbosity ({'high', 'low'})
                 C. log_likelihood_h
     Value:  low

-print_test_value (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8ffcd0>>)
+print_test_value (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394f610>>)
     Doc:  If 'True', the __eval__ of an PyTensor variable will return its test_value when this is available. This has the practical consequence that, e.g., in debugging `my_var` will print the same as `my_var.tag.test_value` when a test value is defined.
     Value:  False

@@ -167,19 +168,19 @@ compute_test_value_opt ({'off', 'ignore', 'warn', 'raise', 'pdb'})
     Doc:  For debugging PyTensor optimization only. Same as compute_test_value, but is used during PyTensor optimization
     Value:  off

-check_input (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8ffdd0>>)
+check_input (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394f750>>)
     Doc:  Specify if types should check their input in their C code. It can be used to speed up compilation, reduce overhead (particularly for scalars) and reduce the number of generated C files.
     Value:  True

-NanGuardMode__nan_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e8fff10>>)
+NanGuardMode__nan_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394f850>>)
     Doc:  Default value for nan_is_error
     Value:  True

-NanGuardMode__inf_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4090>>)
+NanGuardMode__inf_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394f990>>)
     Doc:  Default value for inf_is_error
     Value:  True

-NanGuardMode__big_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4110>>)
+NanGuardMode__big_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394fa10>>)
     Doc:  Default value for big_is_error
     Value:  True

@@ -191,15 +192,15 @@ DebugMode__patience (<class 'int'>)
     Doc:  Optimize graph this many times to detect inconsistency
     Value:  10

-DebugMode__check_c (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6f75b0d0>>)
+DebugMode__check_c (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394fc50>>)
     Doc:  Run C implementations where possible
     Value:  True

-DebugMode__check_py (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4350>>)
+DebugMode__check_py (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff74857ed0>>)
     Doc:  Run Python implementations where possible
     Value:  True

-DebugMode__check_finite (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffff76139d0>>)
+DebugMode__check_finite (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394fb10>>)
     Doc:  True -> complain about NaN/Inf results
     Value:  True

@@ -207,7 +208,7 @@ DebugMode__check_strides (<class 'int'>)
     Doc:  Check that Python- and C-produced ndarrays have same strides. On difference: (0) - ignore, (1) warn, or (2) raise error
     Value:  0

-DebugMode__warn_input_not_reused (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6f75b990>>)
+DebugMode__warn_input_not_reused (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394fdd0>>)
     Doc:  Generate a warning when destroy_map or view_map says that an op works inplace, but the op did not reuse the input for its output.
     Value:  True

@@ -219,7 +220,7 @@ DebugMode__check_preallocated_output_ndim (<class 'int'>)
     Doc:  When testing with "strided" preallocated output memory, test all combinations of strides over that number of (inner-most) dimensions. You may want to reduce that number to reduce memory or time usage, but it is advised to keep a minimum of 2.
     Value:  4

-profiling__time_thunks (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4450>>)
+profiling__time_thunks (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394fed0>>)
     Doc:  Time individual thunks when profiling
     Value:  True

@@ -240,7 +241,7 @@ profiling__min_memory_size (<class 'int'>)
                  of their outputs (in bytes) is lower than this threshold
     Value:  1024

-profiling__min_peak_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4610>>)
+profiling__min_peak_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394ff10>>)
     Doc:  The min peak memory usage of the order
     Value:  False

@@ -248,11 +249,11 @@ profiling__destination (<class 'str'>)
     Doc:  File destination of the profiling output
     Value:  stderr

-profiling__debugprint (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6f8e1c90>>)
+profiling__debugprint (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7394fd50>>)
     Doc:  Do a debugprint of the profiled functions
     Value:  False

-profiling__ignore_first_call (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae47d0>>)
+profiling__ignore_first_call (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7393c050>>)
     Doc:  Do we ignore the first call of an PyTensor function.
     Value:  False

@@ -260,7 +261,7 @@ on_shape_error ({'raise', 'warn'})
     Doc:  warn: print a warning and use the default value. raise: raise an error
     Value:  warn

-openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffff73bb910>>)
+openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7479d0d0>>)
     Doc:  Allow (or not) parallel computation on the CPU with OpenMP. This is the default value used when creating an Op that supports OpenMP parallelization. It is preferable to define it via the PyTensor configuration file ~/.pytensorrc or with the environment variable PYTENSOR_FLAGS. Parallelization is only done for some operations that implement it, and even for operations that implement parallelism, each operation is free to respect this flag or not. You can control the number of threads used with the environment variable OMP_NUM_THREADS. If it is set to 1, we disable openmp in PyTensor by default.
     Value:  False

@@ -312,23 +313,23 @@ unittests__rseed (<class 'str'>)
     Doc:  Seed to use for randomized unit tests. Special value 'random' means using a seed of None.
     Value:  666

-warn__round (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4bd0>>)
+warn__round (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7393c450>>)
     Doc:  Warn when using `tensor.round` with the default mode. Round changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy.
     Value:  False

-profile (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4c50>>)
+profile (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7393c590>>)
     Doc:  If VM should collect profile information
     Value:  False

-profile_optimizer (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4c90>>)
+profile_optimizer (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7393c390>>)
     Doc:  If VM should collect optimizer profile information
     Value:  False

-profile_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eda86d0>>)
+profile_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7393c5d0>>)
     Doc:  If VM should collect memory profile information and print it
     Value:  False

-<pytensor.configparser.ConfigParam object at 0x7ffe6eae4dd0>
+<pytensor.configparser.ConfigParam object at 0x7fff73e070d0>
     Doc:  Useful only for the VM Linkers. When lazy is None, auto detect if lazy evaluation is needed and use the appropriate version. If the C loop isn't being used and lazy is True, use the Stack VM; otherwise, use the Loop VM.
     Value:  None

@@ -336,11 +337,11 @@ numba__vectorize_target ({'cuda', 'parallel', 'cpu'})
     Doc:  Default target for numba.vectorize.
     Value:  cpu

-numba__fastmath (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6f8e23d0>>)
+numba__fastmath (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7393c790>>)
     Doc:  If True, use Numba's fastmath mode.
     Value:  True

-numba__cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6eae4ed0>>)
+numba__cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff7393c6d0>>)
     Doc:  If True, use Numba's file based caching.
     Value:  True

@@ -353,27 +354,27 @@ Defaults to compiledir_%(short_platform)s-%(processor)s-
 %(python_version)s-%(python_bitwidth)s.
     Value:  compiledir_%(short_platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s

-<pytensor.configparser.ConfigParam object at 0x7ffe6f9b6850>
+<pytensor.configparser.ConfigParam object at 0x7fff74af4690>
     Doc:  platform-independent root directory for compiled modules
-    Value:  /build/tmp.41UnBoqk62/.pytensor
+    Value:  /build/tmp.gHcmWT784l/.pytensor

-<pytensor.configparser.ConfigParam object at 0x7ffe6f94a390>
+<pytensor.configparser.ConfigParam object at 0x7fff7456fb90>
     Doc:  platform-dependent cache directory for compiled modules
-    Value:  /build/tmp.41UnBoqk62/.pytensor/compiledir_Linux-6.5--generic-x86_64-with-glibc2.38--3.11.5-64
+    Value:  /build/tmp.gHcmWT784l/.pytensor/compiledir_Linux-6.1.62-x86_64-with-glibc2.38--3.11.6-64

 blas__ldflags (<class 'str'>)
     Doc:  lib[s] to include for [Fortran] level-3 blas implementation
-    Value:  -L/nix/store/fcfbp2iphh271h19m3g0hi5q0x20l8vv-lapack-3/lib -L/nix/store/29jz2fshxr7lf0ny4hb3q3z0jqpqmfz7-blas-3/lib -llapack -llapacke -lblas -lcblas -llapack -llapacke -lblas -lcblas
+    Value:

-blas__check_openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6e700e90>>)
+blas__check_openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7fff73751090>>)
     Doc:  Check for openmp library conflict.
 WARNING: Setting this to False leaves you open to wrong results in blas-related operations.
     Value:  True

-scan__allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6b0058d0>>)
+scan__allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffeee3d38d0>>)
     Doc:  Allow/disallow gc inside of Scan (default: False)
     Value:  False

-scan__allow_output_prealloc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffe6b3d9610>>)
+scan__allow_output_prealloc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x7ffeee975590>>)
     Doc:  Allow/disallow memory preallocation for outputs inside of scan (default: True)
     Value:  True

GaetanLepage commented 11 months ago

new.txt is the failing 2.18.0 build.
old.txt is the 2.17.0 working one.

ricardoV94 commented 11 months ago

Thanks! The diff is a bit verbose because of the memory locations, but the critical changes seems to be here:

+WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
...
 blas__ldflags (<class 'str'>)
     Doc:  lib[s] to include for [Fortran] level-3 blas implementation
-    Value:  -L/nix/store/fcfbp2iphh271h19m3g0hi5q0x20l8vv-lapack-3/lib -L/nix/store/29jz2fshxr7lf0ny4hb3q3z0jqpqmfz7-blas-3/lib -llapack -llapacke -lblas -lcblas -llapack -llapacke -lblas -lcblas
+    Value:

This is most likely caused by https://github.com/pymc-devs/pytensor/pull/444

CC @lucianopaz

GaetanLepage commented 11 months ago

Oh, thanks for pointing this out. Do you recommend us to change our build flags then or is is something you plan to "fix" ?

ricardoV94 commented 11 months ago

I think we should fix it. We didn't want the behavior to change for users

GaetanLepage commented 11 months ago

I think we should fix it. We didn't want the behavior to change for users

Ok, understood. We will then wait for the next release to update. Thanks for your help :)

lucianopaz commented 11 months ago

I’m confused about the two failing tests. The first says that we expect a C implementation of GeMV but we get a python implementation instead? Or the other way around? Does this mean that the lapack link flags are wrong? If that’s the case, should we remove the lapack blas flags from the check our should we fix lapack? The second failure I don’t know

ricardoV94 commented 11 months ago

The first says that we expect a C implementation of GeMV but we get a python implementation instead

The test was expecting a CGEMV and gets a GEMV because PyTensor is not finding the blas/lapack (I don't know the difference) flags in the newer version (so the rewrite that introduces the C version doesn't get triggered). @GaetanLepage showed that indeed the new flags are empty and it now gets the usual using numpy impl warning.

I think the second test failure comes from the same source. It is supposed to work if some blas Ops get inserted but they are not.

lucianopaz commented 11 months ago

This might mean that lapack blas link flags are wrong. These are an addition brought in by #444. The quickest patch would be to comment out the lapack condition from the default blas flags function

lucianopaz commented 11 months ago

@GaetanLepage, does it also break if you set the blas flags to -l blas?

GaetanLepage commented 11 months ago

@GaetanLepage, does it also break if you set the blas flags to -l blas?

When I install pytensor you mean ?

lucianopaz commented 11 months ago

@ferrine, did you have access to a nixOS machine to try to delve into this issue?

lucianopaz commented 11 months ago

@GaetanLepage, I've got a patch that seems to be working over at #517. It should fix the two failing tests that you reported. It turns out that they were both caused by empty blas__ldflags, but in reality, these tests should be able to run regardless of the blas flags.

Your problem seems to run a bit deeper though. I had understood that in the working version you used to have an empty blas__ldflags value but it looks like I misread the diff statement. Your diff says that in the old version you had

blas__ldflags = -L/nix/store/fcfbp2iphh271h19m3g0hi5q0x20l8vv-lapack-3/lib -L/nix/store/29jz2fshxr7lf0ny4hb3q3z0jqpqmfz7-blas-3/lib -llapack -llapacke -lblas -lcblas -llapack -llapacke -lblas -lcblas

But the new pytensor version has it empty, right? The crux of the matter is to learn how these blas flags come about. To help me reason about this, I wanted to ask you some questions.

Did you supply the lapack and blas library directories to pytensor somehow? If you didn't and pytensor was able to find them automatically before, it means that they must have been brought in by numpy.distutils.config.
Another thing that's important for me is to know whether those two paths are included in the libraries returned by /nix/store/zlzz2z48s7ry0hkl55xiqp5a73b4mzrg-gcc-wrapper-12.3.0/bin/g++ -print-search-dirs? If they aren't, then the compiler doesn't know about them unless it is explicitly told about them.

lucianopaz commented 11 months ago

@GaetanLepage, we fixed the tests that were failing as a side effect of having empty blas flags, so the nixOS build should work now. The problem is that, when pytensor is imported, it will try to find the blas libraries in the default search directories of the compiler or of the python library directory. The flags that you were getting before were provided by numpy as a side-effect of saying that you wanted to compile it with lapack and blas and the fact that numpy used to store that build information in a numpy.distutils property. That property was removed in python 3.12 and we changed the logic for blas detection.

I'm not familiar with nixOS so I'll try to ask a few questions to see if we can improve the user experience for pytensor there. Neither BLAS nor Lapack are build-time dependencies for pytensor. They are only used at runtime, when an Op needs to compile to C code and then get linked to one of those libraries. Once nix installs BLAS and Lapack, can we know where to look for them in the file-system at runtime? If yes, we could apply a similar patch to #517, but keep it nixOS specific, so that BLAS and Lapack are searched in some nix default place. That way, nix users would get to use pytensor with some blas flags without having to do any kind of manual override of the .pytensorrc or environment variables.

GaetanLepage commented 11 months ago

@lucianopaz thank you for taking the time to fix this ! I can confirm that everything runs fine (for building at least) with the latest release: https://github.com/NixOS/nixpkgs/pull/267030. I know that we can add runtime paths to the derivations we do. As for how we should handle this specific case, I am afraid that I lack a bit of experience to answer. @SomeoneSerge or @mweinelt will probably know better.

SomeoneSerge commented 11 months ago

Hi! Is my understanding correct that,

The "C Ops" offer a JIT compilation functionality and are part of pytensor's public interface,
One of the runtime dependencies for pytensor (or the Ops part) is a working toolchain for the host platform, also configured to know how locate BLAS and Lapack?

In that case we could just pass pytensor one. Is toolchain a required or an optional dependency?

I see that config.cxx is determined at runtime based on PATH: https://github.com/pymc-devs/pytensor/blob/7ecb9f8c6b6a2eac940947bac955a10785240667/pytensor/configdefaults.py#L397-L454. We could prepare a wrapped g++ for pytensor already at build time, such that the compiler used by pytensor wouldn't leak into the users' PATHs (ugly baseline: we can patch configdefaults.py). Do you think that'd make sense?

RE: using -print-search-dirs to configure BLAS https://github.com/pymc-devs/pytensor/blob/7ecb9f8c6b6a2eac940947bac955a10785240667/pytensor/link/c/cmodule.py#L2724-L2726

I think this works for us too, but what do you think about using e.g. pkg-config? That's a more public interface, and it's also more "override-friendly" (users and distributions can adjust the generated flags as appropriate to their environments). For instance, the whole -L... -l... line can be generated by running pkg-config --libs blas:

❯ nix-shell -p blas -p lapack -p pkg-config
❯ pkg-config --libs blas lapack
-L/nix/store/29jz2fshxr7lf0ny4hb3q3z0jqpqmfz7-blas-3/lib -L/nix/store/fcfbp2iphh271h19m3g0hi5q0x20l8vv-lapack-3/lib -lblas -llapack

And it prints out sensible errors:

❯ nix-shell -p pkg-config
❯ pkg-config --libs blas
Package blas was not found in the pkg-config search path.
Perhaps you should add the directory containing `blas.pc'
to the PKG_CONFIG_PATH environment variable
No package 'blas' found

Once nix installs BLAS and Lapack, can we know where to look for them in the file-system at runtime?

Sure, you can even predict that location before the build happens:)

Thanks!

lucianopaz commented 11 months ago

Hi @SomeoneSerge. Thanks so much for your detailed reply! I'm sorry that it took me so long to answer.

Hi! Is my understanding correct that,

* The "C Ops" offer a JIT compilation functionality and are part of pytensor's public interface,

* One of the runtime dependencies for pytensor (or the Ops part) is a working toolchain for the host platform, also configured to know how locate BLAS and Lapack?

Yes, the Ops get are simply symbolic computations. pytensor does some rewrites or optimizations on the computational graph and then transpiles the operations into some backend. C one of these "backends". The final executables are produced on the host platform by compiling the C extensions using some C compiler. At this time, the host also must have the libraries that are needed to successfully link the extensions (e.g. blas, mkl, lapack). If pytensor can't find these libraries, it wont attempt to use them at the expense of potential performance degradation.

In that case we could just pass pytensor one. Is toolchain a required or an optional dependency?

I see that config.cxx is determined at runtime based on PATH:

https://github.com/pymc-devs/pytensor/blob/7ecb9f8c6b6a2eac940947bac955a10785240667/pytensor/configdefaults.py#L397-L454 . We could prepare a wrapped g++ for pytensor already at build time, such that the compiler used by pytensor wouldn't leak into the users' PATHs (ugly baseline: we can patch configdefaults.py). Do you think that'd make sense?

I'm don't really know nix so I'm a bit lost with how much of this work should go into the nix build and how much of this should go into pytensor refactors. Are you suggesting that we make some changes to the configdefaults.py script in order to be able to configure it at build time? If that's what you're suggesting, I think that's a very interesting solution but I'll need to investigate how it could be implemented.

RE: using -print-search-dirs to configure BLAS

https://github.com/pymc-devs/pytensor/blob/7ecb9f8c6b6a2eac940947bac955a10785240667/pytensor/link/c/cmodule.py#L2724-L2726

I think this works for us too, but what do you think about using e.g. pkg-config? That's a more public interface, and it's also more "override-friendly" (users and distributions can adjust the generated flags as appropriate to their environments). For instance, the whole -L... -l... line can be generated by running pkg-config --libs blas:
❯ nix-shell -p blas -p lapack -p pkg-config
❯ pkg-config --libs blas lapack
-L/nix/store/29jz2fshxr7lf0ny4hb3q3z0jqpqmfz7-blas-3/lib -L/nix/store/fcfbp2iphh271h19m3g0hi5q0x20l8vv-lapack-3/lib -lblas -llapack
And it prints out sensible errors:
❯ nix-shell -p pkg-config
❯ pkg-config --libs blas
Package blas was not found in the pkg-config search path.
Perhaps you should add the directory containing `blas.pc'
to the PKG_CONFIG_PATH environment variable
No package 'blas' found

This seems related to build time configuration that I was asking above. I think that it would be awesome to have, but it looks like that pep is still a draft. Again, I'm a bit lost with how much of this should happen at the pytensor level and how much should happen on nix.

Just to be sure that something actually needs to be changed in pytensor, I wanted to let you know the mechanism that is already in place to configure pytensor. The config values like cxx and blas__ldflags can be read from a .pytensorrc file that by default is searched for in the home directory. One could potentially add such a file when installing the package with build time configuration values such as cxx, blas__flags, and any other platform specific thing. I'm not sure if that is what you actually want to do when installing pytensor on nix, but it would be the simplest way to get it to work out of the box.

Once again, thanks a lot for all of the information in your comment. Let me know if you what your thoughts are on the pkg-config stuff I mentioned above.

pymc-devs / pytensor