Python, JavaScript, C# and Fortran code for hosting EPA web applications and data/model services. Consult the wiki for details: https://github.com/quanted/qed/wiki Served publicly at:
This release adds a significant set of new features arising from combined work
with Intel on ParallelAccelerator technology. It also adds list comprehension
and closure support, support for Numpy 1.13 and a new, faster, CUDA reduction
algorithm. For Linux users this release is the first to be built on Centos 6,
which will be the new base platform for future releases. Finally a number of
thread-safety, type inference and other smaller enhancements and bugs have been
fixed.
ParallelAccelerator features:
NOTE: The ParallelAccelerator technology is under active development and should
be considered experimental.
The ParallelAccelerator technology is accessed via a new "nopython" mode option
"parallel". The ParallelAccelerator technology attempts to identify operations
which have parallel semantics (for instance adding a scalar to a vector), fuse
together adjacent such operations, and then parallelize their execution across
a number of CPU cores. This is essentially auto-parallelization.
In addition to the auto-parallelization feature, explicit loop based
parallelism is made available through the use of prange in place of range
as a loop iterator.
More information and examples on both auto-parallelization and prange are
available in the documentation and examples directory respectively.
As part of the necessary work for ParallelAccelerator, support for closures
and list comprehensions is added:
PR 2318: Transfer ParallelAccelerator technology to Numba
PR 2379: ParallelAccelerator Core Improvements
PR 2367: Add support for len(range(...))
PR 2369: List comprehension
PR 2391: Explicit Parallel Loop Support (prange)
The ParallelAccelerator features are available on all supported platforms and
Python versions with the exceptions of (with view of supporting in a future
release):
The combination of Windows operating systems with Python 2.7.
Systems running 32 bit Python.
CUDA support enhancements:
PR 2377: New GPU reduction algorithm
CUDA support fixes:
PR 2397: Fix 2393, always set alignment of cuda static memory regions
Misc Fixes:
PR 2373, Issue 2372: 32-bit compatibility fix for parfor related code
PR 2376: Fix 2375 missing stdint.h for py2.7 vc9
PR 2378: Fix deadlock in parallel gufunc when kernel acquires the GIL.
PR 2382: Forbid unsafe casting in bitwise operation
PR 2385: docs: fix Sphinx errors
PR 2396: Use 64-bit RHS operand for shift
PR 2404: Fix threadsafety logic issue in ufunc compilation cache.
PR 2424: Ensure consistent iteration order of blocks for type inference.
PR 2425: Guard code to prevent the use of 'parallel' on win32 + py27
PR 2426: Basic test for Enum member type recovery.
PR 2433: Fix up the parfors tests with respect to windows py2.7
PR 2442: Skip tests that need BLAS/LAPACK if scipy is not available.
PR 2444: Add test for invalid array setitem
PR 2449: Make the runtime initialiser threadsafe
PR 2452: Skip CFG test on 64bit windows
Misc Enhancements:
PR 2366: Improvements to IR utils
PR 2388: Update README.rst to indicate the proper version of LLVM
PR 2394: Upgrade to llvmlite 0.19.*
PR 2395: Update llvmlite version to 0.19
PR 2406: Expose environment object to ufuncs
PR 2407: Expose environment object to target-context inside lowerer
PR 2413: Add flags to pass through to conda build for buildbot
PR 2414: Add cross compile flags to local recipe
PR 2415: A few cleanups for rewrites
PR 2418: Add getitem support for Enum classes
PR 2419: Add support for returning enums in vectorize
PR 2421: Add copyright notice for Intel contributed files.
PR 2422: Patch code base to work with np 1.13 release
PR 2448: Adds in warning message when using 'parallel' if cache=True
PR 2450: Add test for keyword arg on .sum-like and .cumsum-like array
methods
0.33.0
This release resolved several performance issues caused by atomic
reference counting operations inside loop bodies. New optimization
passes have been added to reduce the impact of these operations. We
observe speed improvements between 2x-10x in affected programs due to
the removal of unnecessary reference counting operations.
There are also several enhancements to the CUDA GPU support:
A GPU random number generator based on xoroshiro128+ algorithm <http://xoroshiro.di.unimi.it/>_ is added.
See details and examples in :ref:documentation <cuda-random>.
cuda.jit CUDA kernels can now call jit and njit
CPU functions and they will automatically be compiled as CUDA device
functions.
CUDA IPC memory API is exposed for sharing memory between proceses.
See usage details in :ref:documentation <cuda-ipc-memory>.
Reference counting enhancements:
PR 2346, Issue 2345, 2248: Add extra refcount pruning after inlining
PR 2349: Fix refct pruning not removing refct op with tail call.
PR 2352, Issue 2350: Add refcount pruning pass for function that does not need refcount
CUDA support enhancements:
PR 2023: Supports CUDA IPC for device array
PR 2343, Issue 2335: Allow CPU jit decorated function to be used as cuda device function
PR 2347: Add random number generator support for CUDA device code
PR 2362: Avoid test failure due to typing to int32 on 32-bit platforms
PR 2359: Fixed nogil example that threw a TypeError when executed.
PR 2357, Issue 2356: Fix fragile test that depends on how the script is executed.
PR 2355: Fix cpu dispatcher referenced as attribute of another module
PR 2354: Fixes an issue with caching when function needs NRT and refcount pruning
PR 2342, Issue 2339: Add warnings to inspection when it is used on unserialized cached code
PR 2329, Issue 2250: Better handling of missing op codes
Misc enhancements:
PR 2360: Adds missing values in error mesasge interp.
PR 2353: Handle when get_host_cpu_features() raises RuntimeError
PR 2351: Enable SVML for erf/erfc/gamma/lgamma/log2
PR 2344: Expose error_model setting in jit decorator
PR 2337: Align blocking terminate support for fork() with new TBB version
PR 2336: Bump llvmlite version to 0.18
PR 2330: Core changes in PR 2318
0.32.0
In this release, we are upgrading to LLVM 4.0. A lot of work has been done
to fix many race-condition issues inside LLVM when the compiler is
used concurrently, which is likely when Numba is used with Dask.
Improvements:
PR 2322: Suppress test error due to unknown but consistent error with tgamma
PR 2320: Update llvmlite dependency to 0.17
PR 2308: Add details to error message on why cuda support is disabled.
PR 2302: Add os x to travis
PR 2294: Disable remove_module on MCJIT due to memory leak inside LLVM
PR 2291: Split parallel tests and recycle workers to tame memory usage
PR 2253: Remove the pointer-stuffing hack for storing meminfos in lists
Fixes:
PR 2331: Fix a bug in the GPU array indexing
PR 2326: Fix 2321 docs referring to non-existing function.
PR 2316: Fixing more race-condition problems
PR 2315: Fix 2314. Relax strict type check to allow optional type.
PR 2310: Fix race condition due to concurrent compilation and cache loading
PR 2304: Fix intrinsic 1st arg not a typing.Context as stated by the docs.
PR 2287: Fix int64 atomic min-max
PR 2286: Fix 2285 overload_method not linking dependent libs
PR 2303: Missing import statements to interval-example.rst
0.31.0
In this release, we added preliminary support for debugging with GDB
version >= 7.0. The feature is enabled by setting the debug=True compiler
option, which causes GDB compatible debug info to be generated.
The CUDA backend also gained limited debugging support so that source locations
are showed in memory-checking and profiling tools.
For details, see :ref:numba-troubleshooting.
Also, we added the fastmath=True compiler option to enable unsafe
floating-point transformations, which allows LLVM to auto-vectorize more code.
Other important changes include upgrading to LLVM 3.9.1 and adding support for
Numpy 1.12.
Improvements:
PR 2281: Update for numpy1.12
PR 2278: Add CUDA atomic.{max, min, compare_and_swap}
PR 2277: Add about section to conda recipies to identify license and other
metadata in Anaconda Cloud
PR 2271: Adopt itanium C++-style mangling for CPU and CUDA targets
PR 2267: Add fastmath flags
PR 2261: Support dtype.type
PR 2249: Changes for llvm3.9
PR 2234: Bump llvmlite requirement to 0.16 and add install_name_tool_fixer to
mviewbuf for OS X
PR 2230: Add python3.6 to TravisCi
PR 2227: Enable caching for gufunc wrapper
PR 2170: Add debugging support
PR 2037: inspect_cfg() for easier visualization of the function operation
Fixes:
PR 2274: Fix nvvm ir patch in mishandling "load"
PR 2272: Fix breakage to cuda7.5
PR 2269: Fix caching of copy_strides kernel in cuda.reduce
PR 2265: Fix 2263: error when linking two modules with dynamic globals
PR 2252: Fix path separator in test
PR 2246: Fix overuse of memory in some system with fork
PR 2241: Fix 2240: module in dynamically created function not a str
This is a bug-fix release to enable Python 3.6 support. In addition,
there is now early Intel TBB support for parallel ufuncs when building from
source with TBBROOT defined. The TBB feature is not enabled in our official
builds.
Fixes:
PR 2232: Fix name clashes with _Py_hashtable_xxx in Python 3.6.
Improvements:
PR 2217: Add Intel TBB threadpool implementation for parallel ufunc.
0.30.0
This release adds preliminary support for Python 3.6, but no official build is
available yet. A new system reporting tool (numba --sysinfo) is added to
provide system information to help core developers in replication and debugging.
See below for other improvements and bug fixes.
Improvements:
PR 2209: Support Python 3.6.
PR 2175: Support np.trace(), np.outer() and np.kron().
PR 2197: Support np.nanprod().
PR 2190: Support caching for ufunc.
PR 2186: Add system reporting tool.
Fixes:
PR 2214, Issue 2212: Fix memory error with ndenumerate and flat iterators.
PR 2206, Issue 2163: Fix zip() consuming extra elements in early
exhaustion.
PR 2204, Issue 2178: Fix annotation for liftedloop.
PR 2203: Fix Appveyor segfault with Python 3.5.
PR 2202, Issue 2198: Fix target context not initialized when loading from
ufunc cache.
PR 2172, Issue 2171: Fix optional type unpacking.
PR 2189, Issue 2188: Disable freezing of big (>1MB) global arrays.
PR 2180, Issue 2179: Fix invalid variable version in looplifting.
PR 2156, Issue 2155: Fix divmod, floordiv segfault on CUDA.
0.29.0
This release extends the support of recursive functions to include direct and
indirect recursion without explicit function type annotations. See new example
in examples/mergesort.py. Newly supported numpy features include array
stacking functions, np.linalg.eig* functions, np.linalg.matrix_power, np.roots
and array to array broadcasting in assignments.
This release depends on llvmlite 0.14.0 and supports CUDA 8 but it is not
required.
Improvements:
PR 2130, 2137: Add type-inferred recursion with docs and examples.
PR 2134: Add np.linalg.matrix_power.
PR 2125: Add np.roots.
PR 2129: Add np.linalg.{eigvals,eigh,eigvalsh}.
PR 2126: Add array-to-array broadcasting.
PR 2069: Add hstack and related functions.
PR 2128: Allow for vectorizing a jitted function. (thanks to dhirschfeld)
PR 2117: Update examples and make them test-able.
PR 2127: Refactor interpreter class and its results.
PR 2145, Issue 2009: Fixes kwargs for jitclass __init__ method.
PR 2150: Fix slowdown in objmode fallback.
PR 2050, Issue 1259: Fix liveness problem with some generator loops.
PR 2072, Issue 1995: Right shift of unsigned LHS should be logical.
PR 2115, Issue 1466: Fix inspect_types() error due to mangled variable name.
PR 2119, Issue 2118: Fix array type created from record-dtype.
PR 2122, Issue 1808: Fix returning a generator due to datamodel error.
0.28.1
This is a bug-fix release to resolve packaging issues with setuptools
dependency.
0.28.0
Amongst other improvements, this version improves again the level of
support for linear algebra -- functions from the :mod:numpy.linalg
module. Also, our random generator is now guaranteed to be thread-safe
and fork-safe.
Improvements:
PR 2019: Add the intrinsic decorator to define low-level
subroutines callable from JIT functions (this is considered
a private API for now).
PR 2059: Implement np.concatenate and np.stack.
PR 2048: Make random generation fork-safe and thread-safe, producing
independent streams of random numbers for each thread or process.
PR 2031: Add documentation of floating-point pitfalls.
Issue 2053: Avoid polling in parallel CPU target (fixes severe performance
regression on Windows).
Issue 2029: Make default arguments fast.
PR 2052: Add logging to the CUDA driver.
PR 2049: Implement the built-in divmod() function.
PR 2036: Implement the argsort() method on arrays.
PR 2046: Improving CUDA memory management by deferring deallocations
until certain thresholds are reached, so as to avoid breaking asynchronous
execution.
PR 2040: Switch the CUDA driver implementation to use CUDA's
"primary context" API.
PR 2017: Allow min(tuple) and max(tuple).
PR 2039: Reduce fork() detection overhead in CUDA.
PR 2021: Handle structured dtypes with titles.
PR 1996: Rewrite looplifting as a transformation on Numba IR.
PR 2014: Implement np.linalg.matrix_rank.
PR 2012: Implement np.linalg.cond.
PR 1985: Rewrite even trivial array expressions, which opens the door
for other optimizations (for example, array ** 2 can be converted
into array * array).
PR 1950: Have typeof() always raise ValueError on failure.
Previously, it would either raise or return None, depending on the input.
PR 1994: Implement np.linalg.norm.
PR 1987: Implement np.linalg.det and np.linalg.slogdet.
Issue 1979: Document integer width inference and how to workaround.
PR 1938: Numba is now compatible with LLVM 3.8.
PR 1967: Restrict np.linalg functions to homogenous dtypes. Users
wanting to pass mixed-typed inputs have to convert explicitly, which
makes the performance implications more obvious.
Fixes:
PR 2006: array(float32) ** int should return array(float32).
PR 2044: Allow reshaping empty arrays.
Issue 2051: Fix refcounting issue when concatenating tuples.
Issue 2000: Make Numpy optional for setup.py, to allow pip install
to work without Numpy pre-installed.
PR 1989: Fix assertion in Dispatcher.disable_compile().
Issue 2028: Ignore filesystem errors when caching from multiple processes.
Issue 2003: Allow unicode variable and function names (on Python 3).
Issue 1998: Fix deadlock in parallel ufuncs that reacquire the GIL.
PR 1997: Fix random crashes when AOT compiling on certain Windows platforms.
Issue 1988: Propagate jitclass docstring.
Issue 1933: Ensure array constants are emitted with the right alignment.
0.27.0
Improvements:
Issue 1976: improve error message when non-integral dimensions are given
to a CUDA kernel.
PR 1970: Optimize the power operator with a static exponent.
PR 1710: Improve contextual information for compiler errors.
PR 1961: Support printing constant strings.
PR 1959: Support more types in the print() function.
PR 1823: Support compute_50 in CUDA backend.
PR 1955: Support np.linalg.pinv.
PR 1896: Improve the SmartArray API.
PR 1947: Support np.linalg.solve.
Issue 1943: Improve error message when an argument fails typing.4
PR 1927: Support np.linalg.lstsq.
PR 1934: Use system functions for hypot() where possible, instead of our
own implementation.
PR 1929: Add cffi support to cfunc objects.
PR 1932: Add user-controllable thread pool limits for parallel CPU target.
PR 1928: Support self-recursion when the signature is explicit.
PR 1890: List all lowering implementations in the developer docs.
Issue 1884: Support np.lib.stride_tricks.as_strided().
Fixes:
Issue 1960: Fix sliced assignment when source and destination areas are
overlapping.
PR 1963: Make CUDA print() atomic.
PR 1956: Allow 0d array constants.
Issue 1945: Allow using Numpy ufuncs in AOT compiled code.
Issue 1916: Fix documentation example for generated_jit.
Issue 1926: Fix regression when caching functions in an IPython session.
Issue 1923: Allow non-intp integer arguments to carray() and farray().
Issue 1908: Accept non-ASCII unicode docstrings on Python 2.
Issue 1874: Allow del container[key] in object mode.
Issue 1913: Fix set insertion bug when the lookup chain contains deleted
entries.
Issue 1911: Allow function annotations on jitclass methods.
0.26.0
This release adds support for cfunc decorator for exporting numba jitted
functions to 3rd party API that takes C callbacks. Most of the overhead of
using jitclasses inside the interpreter are eliminated. Support for
decompositions in numpy.linalg are added. Finally, Numpy 1.11 is
supported.
Improvements:
PR 1889: Export BLAS and LAPACK wrappers for pycc.
PR 1888: Faster array power.
Issue 1867: Allow "out" keyword arg for dufuncs.
PR 1871: carray() and farray() for creating arrays from pointers.
PR 1855: cfunc decorator for exporting as ctypes function.
PR 1862: Add support for numpy.linalg.qr.
PR 1851: jitclass support for '_' and '__' prefixed attributes.
PR 1842: Optimize jitclass in Python interpreter.
Issue 1837: Fix CUDA simulator issues with device function.
PR 1839: Add support for decompositions from numpy.linalg.
PR 1829: Support Python enums.
PR 1828: Add support for numpy.random.rand()``` and numpy.random.randn()``
Issue 1825: Use of 0-darray in place of scalar index.
Issue 1824: Scalar arguments to object mode gufuncs.
Issue 1813: Let bitwise bool operators return booleans, not integers.
Issue 1760: Optional arguments in generators.
PR 1780: Numpy 1.11 support.
0.25.0
This release adds support for set objects in nopython mode. It also
adds support for many missing Numpy features and functions. It improves
Numba's compatibility and performance when using a distributed execution
framework such as dask, distributed or Spark. Finally, it removes
compatibility with Python 2.6, Python 3.3 and Numpy 1.6.
Improvements:
Issue 1800: Add erf(), erfc(), gamma() and lgamma() to CUDA targets.
PR 1793: Implement more Numpy functions: np.bincount(), np.diff(),
np.digitize(), np.histogram(), np.searchsorted() as well as NaN-aware
reduction functions (np.nansum(), np.nanmedian(), etc.)
PR 1789: Optimize some reduction functions such as np.sum(), np.prod(),
np.median(), etc.
PR 1752: Make CUDA features work in dask, distributed and Spark.
PR 1787: Support np.nditer() for fast multi-array indexing with
broadcasting.
PR 1799: Report JIT-compiled functions as regular Python functions
when profiling (allowing to see the filename and line number where a
function is defined).
PR 1782: Support np.any() and np.all().
Issue 1788: Support the iter() and next() built-in functions.
PR 1778: Support array.astype().
Issue 1775: Allow the user to set the target CPU model for AOT compilation.
PR 1758: Support creating random arrays using the size parameter
to the np.random APIs.
PR 1757: Support len() on array.flat objects.
PR 1749: Remove Numpy 1.6 compatibility.
PR 1748: Remove Python 2.6 and 3.3 compatibility.
PR 1735: Support the not in operator as well as operator.contains().
PR 1724: Support homogenous sets in nopython mode.
Issue 875: make compilation of array constants faster.
Fixes:
PR 1795: Fix a massive performance issue when calling Numba functions
with distributed, Spark or a similar mechanism using serialization.
Issue 1784: Make jitclasses usable with NUMBA_DISABLE_JIT=1.
Issue 1786: Allow using linear algebra functions when profiling.
Issue 1796: Fix np.dot() memory leak on non-contiguous inputs.
PR 1792: Fix static negative indexing of tuples.
Issue 1771: Use fallback cache directory when pycache isn't writable,
such as when user code is installed in a system location.
Issue 1223: Use Numpy error model in array expressions (e.g. division
by zero returns inf or nan instead of raising an error).
Issue 1640: Fix np.random.binomial() for large n values.
Issue 1643: Improve error reporting when passing an invalid spec to
jitclass.
PR 1756: Fix slicing with a negative step and an omitted start.
0.24.0
This release introduces several major changes, including the generated_jit
decorator for flexible specializations as with Julia's "generated" macro,
or the SmartArray array wrapper type that allows seamless transfer of array
data between the CPU and the GPU.
This will be the last version to support Python 2.6, Python 3.3 and Numpy 1.6.
Improvements:
PR 1723: Improve compatibility of JIT functions with the Python profiler.
PR 1509: Support array.ravel() and array.flatten().
PR 1676: Add SmartArray type to support transparent data management in
multiple address spaces (host & GPU).
PR 1689: Reduce startup overhead of importing Numba.
PR 1705: Support registration of CFFI types as corresponding to known
Numba types.
PR 1686: Document the extension API.
PR 1698: Improve warnings raised during type inference.
PR 1697: Support np.dot() and friends on non-contiguous arrays.
PR 1651: Implementation of np.linalg.inv using LAPACK. Thanks to
Matthieu Dartiailh.
PR 1674: Support np.diag().
PR 1673: Improve error message when looking up an attribute on an
unknown global.
Issue 1569: Implement runtime check for the LLVM locale bug.
PR 1612: Switch to LLVM 3.7 in sync with llvmlite.
PR 1624: Allow slice assignment of sequence to array.
PR 1622: Support slicing tuples with a constant slice.
Fixes:
Issue 1722: Fix returning an optional boolean (bool or None).
Issue 1734: NRT decref bug when variable is del'ed before being defined,
leading to a possible memory leak.
PR 1732: Fix tuple getitem regression for CUDA target.
PR 1718: Mishandling of optional to optional casting.
PR 1714: Fix .compile() on a JIT function not respecting ._can_compile.
Issue 1667: Fix np.angle() on arrays.
Issue 1690: Fix slicing with an omitted stop and a negative step value.
PR 1693: Fix gufunc bug in handling scalar formal arg with non-scalar
input value.
PR 1683: Fix parallel testing under Windows.
Issue 1616: Use system-provided versions of C99 math where possible.
Issue 1652: Reductions of bool arrays (e.g. sum() or mean()) should
return integers or floats, not bools.
Issue 1664: Fix regression when indexing a record array with a constant
index.
PR 1661: Disable AVX on old Linux kernels.
Issue 1636: Allow raising an exception looked up on a module.
0.23.1
This is a bug-fix release to address several regressions introduced
in the 0.23.0 release, and a couple other issues.
Fixes:
Issue 1645: CUDA ufuncs were broken in 0.23.0.
Issue 1638: Check tuple sizes when passing a list of tuples.
Issue 1630: Parallel ufunc would keep eating CPU even after finishing
under Windows.
Issue 1628: Fix ctypes and cffi tests under Windows with Python 3.5.
Issue 1627: Fix xrange() support.
PR 1611: Rewrite variable liveness analysis.
Issue 1610: Allow nested calls between explicitly-typed ufuncs.
Issue 1593: Fix *args in object mode.
0.23.0
This release introduces JIT classes using the new jitclass decorator,
allowing user-defined structures for nopython mode. Other improvements
and bug fixes are listed below.
Improvements:
PR 1609: Speed up some simple math functions by inlining them
in their caller
PR 1571: Implement JIT classes
PR 1584: Improve typing of array indexing
PR 1583: Allow printing booleans
PR 1542: Allow negative values in np.reshape()
PR 1560: Support vector and matrix dot product, including np.dot()
and the ```` operator in Python 3.5
PR 1546: Support field lookup on record arrays and scalars (i.e.
array['field'] in addition to array.field)
PR 1440: Support the HSA wavebarrier() and activelanepermute_wavewidth()
intrinsics
PR 1540: Support np.angle()
PR 1543: Implement CPU multithreaded gufuncs (target="parallel")
PR 1551: Allow scalar arguments in np.where(), np.empty_like().
PR 1516: Add some more examples from NumbaPro
PR 1517: Support np.sinc()
Fixes:
Issue 1603: Fix calling a non-cached function from a cached function
Issue 1594: Ensure a list is homogenous when unboxing
Issue 1595: Replace deprecated use of get_pointer_to_function()
Issue 1586: Allow tests to be run by different users on the same machine
Issue 1587: Make CudaAPIError picklable
Issue 1568: Fix using Numba from inside Visual Studio 2015
Issue 1559: Fix serializing a jit function referring a renamed module
PR 1508: Let reshape() accept integer argument(s), not just a tuple
Issue 1545: Improve error checking when unboxing list objects
Issue 1538: Fix array broadcasting in CUDA gufuncs
Issue 1526: Fix a reference count handling bug
0.22.1
This is a bug-fix release to resolve some packaging issues and other
problems found in the 0.22.0 release.
Fixes:
PR 1515: Include MANIFEST.in in MANIFEST.in so that sdist still works from
source tar files.
PR 1518: Fix reference counting bug caused by hidden alias
PR 1519: Fix erroneous assert when passing nopython=True to guvectorize.
PR 1521: Fix cuda.test()
0.22.0
This release features several highlights: Python 3.5 support, Numpy 1.10
support, Ahead-of-Time compilation of extension modules, additional
vectorization features that were previously only available with the
proprietary extension NumbaPro, improvements in array indexing.
Improvements:
PR 1497: Allow scalar input type instead of size-1 array to guvectorize
PR 1480: Add distutils support for AOT compilation
PR 1460: Create a new API for Ahead-of-Time (AOT) compilation
PR 1451: Allow passing Python lists to JIT-compiled functions, and
reflect mutations on function return
PR 1387: Numpy 1.10 support
PR 1464: Support cffi.FFI.from_buffer()
PR 1437: Propagate errors raised from Numba-compiled ufuncs; also,
let "division by zero" and other math errors produce a warning instead
of exiting the function early
PR 1445: Support a subset of fancy indexing
PR 1454: Support "out-of-line" CFFI modules
PR 1442: Improve array indexing to support more kinds of basic slicing
PR 1409: Support explicit CUDA memory fences
PR 1435: Add support for vectorize() and guvectorize() with HSA
PR 1432: Implement numpy.nonzero() and numpy.where()
PR 1416: Add support for vectorize() and guvectorize() with CUDA,
as originally provided in NumbaPro
PR 1424: Support in-place array operators
PR 1414: Python 3.5 support
PR 1404: Add the parallel ufunc functionality originally provided in
NumbaPro
PR 1393: Implement sorting on arrays and lists
PR 1415: Add functions to estimate the occupancy of a CUDA kernel
PR 1360: The JIT cache now stores the compiled object code, yielding
even larger speedups.
PR 1402: Fixes for the ARMv7 (armv7l) architecture under Linux
PR 1400: Add the cuda.reduce() decorator originally provided in NumbaPro
Fixes:
PR 1483: Allow np.empty_like() and friends on non-contiguous arrays
Issue 1471: Allow caching JIT functions defined in IPython
PR 1457: Fix flat indexing of boolean arrays
PR 1421: Allow calling Numpy ufuncs, without an explicit output, on
non-contiguous arrays
Issue 1411: Fix crash when unpacking a tuple containing a Numba-allocated array
Issue 1394: Allow unifying range_state32 and range_state64
Issue 1373: Fix code generation error on lists of bools
0.21.0
This release introduces support for AMD's Heterogeneous System Architecture,
which allows memory to be shared directly between the CPU and the GPU.
Other major enhancements are support for lists and the introduction of
an opt-in compilation cache.
PR 1375: Allow boolean evaluation of lists and tuples
PR 1371: Support array.view() in CUDA mode
PR 1369: Support named tuples in nopython mode
PR 1250: Implement numpy.median().
PR 1289: Make dispatching faster when calling a JIT-compiled function
from regular Python
Issue 1226: Improve performance of integer power
PR 1321: Document features supported with CUDA
PR 1345: HSA support
PR 1343: Support lists in nopython mode
PR 1356: Make Numba-allocated memory visible to tracemalloc
PR 1363: Add an environment variable NUMBA_DEBUG_TYPEINFER
PR 1051: Add an opt-in, per-function compilation cache
Fixes:
Issue 1372: Some array expressions would fail rewriting when involved
the same variable more than once, or a unary operator
Issue 1385: Allow CUDA local arrays to be declared anywhere in a function
Issue 1285: Support datetime64 and timedelta64 in Numpy reduction functions
Issue 1332: Handle the EXTENDED_ARG opcode.
PR 1329: Handle the in operator in object mode
Issue 1322: Fix augmented slice assignment on Python 2
PR 1357: Fix slicing with some negative bounds or step values.
0.20.0
This release updates Numba to use LLVM 3.6 and CUDA 7 for CUDA support.
Following the platform deprecation in CUDA 7, Numba's CUDA feature is no
longer supported on 32-bit platforms. The oldest supported version of
Windows is Windows 7.
Improvements:
Issue 1203: Support indexing ndarray.flat
PR 1200: Migrate cgutils to llvmlite
PR 1190: Support more array methods: .transpose(), .T, .copy(), .reshape(), .view()
PR 1214: Simplify setup.py and avoid manual maintenance
PR 1217: Support datetime64 and timedelta64 constants
PR 1236: Reload environment variables when compiling
PR 1225: Various speed improvements in generated code
PR 1252: Support cmath module in CUDA
PR 1238: Use 32-byte aligned allocator to optimize for AVX
PR 1258: Support numpy.frombuffer()
PR 1274: Use TravisCI container infrastructure for lower wait time
PR 1279: Micro-optimize overload resolution in call dispatch
Issue 1248: Improve error message when return type unification fails
Fixes:
Issue 1131: Handling of negative zeros in np.conjugate() and np.arccos()
Issue 1188: Fix slow array return
Issue 1164: Avoid warnings from CUDA context at shutdown
Issue 1229: Respect the writeable flag in arrays
Issue 1244: Fix bug in refcount pruning pass
Issue 1251: Fix partial left-indexing of Fortran contiguous array
Issue 1264: Fix compilation error in array expression
Issue 1254: Fix error when yielding array objects
Issue 1276: Fix nested generator use
0.19.2
This release fixes the source distribution on pypi. The only change is in the
setup.py file. We do not plan to provide a conda package as this release is
essentially the same as 0.19.1 for conda users.
0.19.1
Issue 1196:
fix double-free segfault due to redundant variable deletion in the
Numba IR (1195)
fix use-after-delete in array expression rewrite pass
0.19.0
This version introduces memory management in the Numba runtime, allowing to
allocate new arrays inside Numba-compiled functions. There is also a rework
of the ufunc infrastructure, and an optimization pass to collapse cascading
array operations into a single efficient loop.
.. warning::
Support for Windows XP and Vista with all compiler targets and support
for 32-bit platforms (Win/Mac/Linux) with the CUDA compiler target are
deprecated. In the next release of Numba, the oldest version of Windows
supported will be Windows 7. CPU compilation will remain supported
on 32-bit Linux and Windows platforms.
Known issues:
There are some performance regressions in very short running nopython
functions due to the additional overhead incurred by memory management.
We will work to reduce this overhead in future releases.
Features:
Issue 1181: Add a Frequently Asked Questions section to the documentation.
Issue 1162: Support the cumsum() and cumprod() methods on Numpy
arrays.
Issue 1152: Support the *args argument-passing style.
Issue 1147: Allow passing character sequences as arguments to
JIT-compiled functions.
Issue 1110: Shortcut deforestation and loop fusion for array expressions.
Issue 1136: Support various Numpy array constructors, for example
numpy.zeros() and numpy.zeros_like().
Issue 1127: Add a CUDA simulator running on the CPU, enabled with the
NUMBA_ENABLE_CUDASIM environment variable.
Issue 1086: Allow calling standard Numpy ufuncs without an explicit
output array from nopython functions.
Issue 1113: Support keyword arguments when calling numpy.empty()
and related functions.
Issue 1108: Support the ctypes.data attribute of Numpy arrays.
Issue 1077: Memory management for array allocations in nopython mode.
Issue 1105: Support calling a ctypes function that takes ctypes.py_object
parameters.
Issue 1084: Environment variable NUMBA_DISABLE_JIT disables compilation
of jit functions, instead calling into the Python interpreter
when called. This allows easier debugging of multiple jitted functions.
Issue 927: Allow gufuncs with no output array.
Issue 1097: Support comparisons between tuples.
Issue 1075: Numba-generated ufuncs can now be called from nopython
functions.
Issue 1062: vectorize now allows omitting the signatures, and will
compile the required specializations on the fly (like jit does).
Issue 1027: Support numpy.round().
Issue 1085: Allow returning a character sequence (as fetched from a
structured array) from a JIT-compiled function.
Fixes:
Issue 1170: Ensure ndindex(), ndenumerate() and ndarray.flat
work properly inside generators.
Issue 1151: Disallow unpacking of tuples with the wrong size.
Issue 1141: Specify install dependencies in setup.py.
Issue 1106: Loop-lifting would fail when the lifted loop does not
produce any output values for the function tail.
Issue 1103: Fix mishandling of some inputs when a JIT-compiled function
is called with multiple array layouts.
Issue 1089: Fix range() with large unsigned integers.
Issue 1088: Install entry-point scripts (numba, pycc) from the conda
build recipe.
Issue 1081: Constant structured scalars now work properly.
Issue 1080: Fix automatic promotion of booleans to integers.
0.18.2
Bug fixes:
Issue 1073: Fixes missing template file for HTML annotation
Issue 1074: Fixes CUDA support on Windows machine due to NVVM API mismatch
0.18.1
Version 0.18.0 is not officially released.
This version removes the old deprecated and undocumented argtypes and
restype arguments to the jit decorator. Function signatures
should always be passed as the first argument to jit.
Features:
Issue 960: Add inspect_llvm() and inspect_asm() methods to JIT-compiled
functions: they output the LLVM IR and the native assembler source of the
compiled function, respectively.
Issue 990: Allow passing tuples as arguments to JIT-compiled functions
in nopython mode.
Issue 774: Support two-argument round() in nopython mode.
Issue 987: Support missing functions from the math module in nopython
mode: frexp(), ldexp(), gamma(), lgamma(), erf(), erfc().
Issue 995: Improve code generation for round() on Python 3.
Issue 981: Support functions from the random and numpy.random modules
in nopython mode.
Issue 979: Add cuda.atomic.max().
Issue 1006: Improve exception raising and reporting. It is now allowed
to raise an exception with an error message in nopython mode.
Issue 821: Allow ctypes- and cffi-defined functions as arguments to
nopython functions.
Issue 901: Allow multiple explicit signatures with jit. The
signatures must be passed in a list, as with vectorize.
Issue 884: Better error message when a JIT-compiled function is called
with the wrong types.
Issue 1010: Simpler and faster CUDA argument marshalling thanks to a
refactoring of the data model.
Issue 1018: Support arrays of scalars inside Numpy structured types.
Issue 808: Reduce Numba import time by half.
Issue 1021: Support the buffer protocol in nopython mode.
Buffer-providing objects, such as bytearray, array.array or
memoryview support array-like operations such as indexing and iterating.
Furthermore, some standard attributes on the memoryview object are
supported.
Issue 1030: Support nested arrays in Numpy structured arrays.
Issue 1033: Implement the inspect_types(), inspect_llvm() and inspect_asm()
methods for CUDA kernels.
Issue 1029: Support Numpy structured arrays with CUDA as well.
Issue 1034: Support for generators in nopython and object mode.
Issue 1044: Support default argument values when calling Numba-compiled
functions.
Issue 1048: Allow calling Numpy scalar constructors from CUDA functions.
Issue 1047: Allow indexing a multi-dimensional array with a single integer,
to take a view.
Issue 1050: Support len() on tuples.
Issue 1011: Revive HTML annotation.
Fixes:
Issue 977: Assignment optimization was too aggressive.
Issue 561: One-argument round() now returns an int on Python 3.
Issue 1001: Fix an unlikely bug where two closures with the same name
and id() would compile to the same LLVM function name, despite different
closure values.
Issue 1006: Fix reference leak when a JIT-compiled function is disposed of.
Issue 1017: Update instructions for CUDA in the README.
Issue 1008: Generate shorter LLVM type names to avoid segfaults with CUDA.
Issue 1005: Properly clean up references when raising an exception from
object mode.
Issue 1041: Fix incompatibility between Numba and the third-party
library "future".
Issue 1053: Fix the size attribute of CUDA shared arrays.
0.18.0
This is a minor release that fixes several issues (263, 262, 258, 237) with
the wheel build. In addition, we have minor fixes for running on PPC64LE
platforms (261). And, we added CI testing against PyPy (253).
0.17.1
This is a bugfix release that addresses issue 258 that our LLVM
binding shared library is missing from the wheel builds.
0.17.0
The major focus in this release has been a rewrite of the documentation.
The new documentation is better structured and has more detailed coverage
of Numba features and APIs. It can be found online at
http://numba.pydata.org/numba-doc/dev/index.html
Features:
Issue 895: LLVM can now inline nested function calls in nopython mode.
Issue 863: CUDA kernels can now infer the types of their arguments
("autojit"-like).
Issue 833: Support numpy.{min,max,argmin,argmax,sum,mean,var,std}
in nopython mode.
Issue 905: Add a nogil argument to the jit decorator, to
release the GIL in nopython mode.
Issue 829: Add a identity argument to vectorize and
guvectorize, to set the identity value of the ufunc.
Issue 843: Allow indexing 0-d arrays with the empty tuple.
Issue 933: Allow named arguments, not only positional arguments, when
calling a Numba-compiled function.
Issue 902: Support numpy.ndenumerate() in nopython mode.
Issue 950: AVX is now enabled by default except on Sandy Bridge and
Ivy Bridge CPUs, where it can produce slower code than SSE.
Issue 956: Support constant arrays of structured type.
Issue 959: Indexing arrays with floating-point numbers isn't allowed
anymore.
Issue 955: Add support for 3D CUDA grids and thread blocks.
Issue 902: Support numpy.ndindex() in nopython mode.
Issue 951: Numpy number types (numpy.int8, etc.) can be used as
constructors for type conversion in nopython mode.
Fixes:
Issue 889: Fix NUMBA_DUMP_ASSEMBLY for the CUDA backend.
Issue 903: Fix calling of stdcall functions with ctypes under Windows.
Issue 908: Allow lazy-compiling from several threads at once.
Issue 868: Wrong error message when multiplying a scalar by a non-scalar.
Issue 917: Allow vectorizing with datetime64 and timedelta64 in the
signature (only with unit-less values, though, because of a Numpy limitation).
Issue 431: Allow overloading of cuda device function.
Issue 917: Print out errors occurred in object mode ufuncs.
Issue 923: Numba-compiled ufuncs now inherit the name and doc of the
original Python function.
Issue 928: Fix boolean return value in nested calls.
Issue 915: jit called with an explicit signature with a mismatching
type of arguments now raises an error.
Issue 784: Fix the truth value of NaNs.
Issue 953: Fix using shared memory in more than one function (kernel or
device).
Issue 970: Fix an uncommon double to uint64 conversion bug on CentOS5
32-bit (C compiler issue).
0.16.0
This release contains a major refactor to switch from llvmpy to llvmlite <https://github.com/numba/llvmlite>_
as our code generation backend. The switch is necessary to reconcile
different compiler requirements for LLVM 3.5 (needs C++11) and Python
extensions (need specific compiler versions on Windows). As a bonus, we have
found the use of llvmlite speeds up compilation by a factor of 2!
Other Major Changes:
Faster dispatch for numpy structured arrays
Optimized array.flat()
Improved CPU feature selection
Fix constant tuple regression in macro expansion code
Known Issues:
AVX code generation is still disabled by default due to performance
regressions when operating on misaligned NumPy arrays. We hope to have a
workaround in the future.
In extremely rare circumstances, a known issue with LLVM 3.5 <http://llvm.org/bugs/show_bug.cgi?id=21423>_
code generation can cause an ELF relocation error on 64-bit Linux systems.
0.15.1
(This was a bug-fix release that superceded version 0.15 before it was
announced.)
Fixes:
Workaround for missing __ftol2 on Windows XP.
Do not lift loops for compilation that contain break statements.
Fix a bug in loop-lifting when multiple values need to be returned to
the enclosing scope.
Handle the loop-lifting case where an accumulator needs to be updated when
the loop count is zero.
0.15
Features:
Support for the Python cmath module. (NumPy complex functions were
already supported.)
Support for .real, .imag, and `.conjugate()`` on non-complex
numbers.
Add support for math.isfinite() and math.copysign().
Compatibility mode: If enabled (off by default), a failure to compile in
object mode will fall back to using the pure Python implementation of the
function.
Experimental support for serializing JIT functions with cloudpickle.
Loop-jitting in object mode now works with loops that modify scalars that
are accessed after the loop, such as accumulators.
vectorize functions can be compiled in object mode.
Numba can now be built using the Visual C++ Compiler for Python 2.7 <http://aka.ms/vcpython27>_
on Windows platforms.
CUDA JIT functions can be returned by factory functions with variables in
the closure frozen as constants.
Support for "optional" types in nopython mode, which allow None to be a
valid value.
Fixes:
If nopython mode compilation fails for any reason, automatically fall back
to object mode (unless nopython=True is passed to jit) rather than raise
an exeception.
Allow function objects to be returned from a function compiled in object
mode.
Fix a linking problem that caused slower platform math functions (such as
exp()) to be used on Windows, leading to performance regressions against
NumPy.
min() and max() no longer accept scalars arguments in nopython mode.
Fix handling of ambigous type promotion among several compiled versions of a
JIT function. The dispatcher will now compile a new version to resolve the
problem. (issue 776)
Fix float32 to uint64 casting bug on 32-bit Linux.
Fix type inference to allow forced casting of return types.
Allow the shape of a 1D cuda.shared.array and cuda.local.array to be
a one-element tuple.
More correct handling of signed zeros.
Add custom implementation of atan2() on Windows to handle special cases
properly.
Eliminated race condition in the handling of the pagelocked staging area
used when transferring CUDA arrays.
Fix non-deterministic type unification leading to varying performance.
(issue 797)
0.15.0
Enhancements:
PR 213: Add partial LLVM bindings for ObjectFile.
PR 215: Add inline assembly helpers in the builder.
PR 216: Allow specifying alignment in alloca instructions.
PR 219: Remove unnecessary verify in module linkage.
Fixes:
PR 209, Issue 208: Fix overly restrictive test for library filenames.
0.14
Features:
Support for nearly all the Numpy math functions (including comparison,
logical, bitwise and some previously missing float functions) in nopython mode.
The Numpy datetime64 and timedelta64 dtypes are supported in nopython mode
with Numpy 1.7 and later.
Support for Numpy math functions on complex numbers in nopython mode.
ndarray.sum() is supported in nopython mode.
Better error messages when unsupported types are used in Numpy math functions.
Set NUMBA_WARNINGS=1 in the environment to see which functions are compiled
in object mode vs. nopython mode.
Add support for the two-argument pow() builtin function in nopython mode.
New developer documentation describing how Numba works, and how to
add new types.
Support for Numpy record arrays on the GPU. (Note: Improper alignment of dtype
fields will cause an exception to be raised.)
Slices on GPU device arrays.
GPU objects can be used as Python context managers to select the active
device in a block.
GPU device arrays can be bound to a CUDA stream. All subsequent operations
(such as memory copies) will be queued on that stream instead of the default.
This can prevent unnecessary synchronization with other streams.
Fixes:
Generation of AVX instructions has been disabled to avoid performance bugs
when calling external math functions that may use SSE instructions,
especially on OS X.
JIT functions can be removed by the garbage collector when they are no
longer accessible.
Various other reference counting fixes to prevent memory leaks.
Fixed handling of exception when input argument is out of range.
Prevent autojit functions from making unsafe numeric conversions when
called with different numeric types.
Fix a compilation error when an unhashable global value is accessed.
Gracefully handle failure to enable faulthandler in the IPython Notebook.
Fix a bug that caused loop lifting to fail if the loop was inside an
else block.
Fixed a problem with selecting CUDA devices in multithreaded programs on
Linux.
The pow() function (and ** operation) applied to two integers now
returns an integer rather than a float.
Numpy arrays using the object dtype no longer cause an exception in the
autojit.
Attempts to write to a global array will cause compilation to fall back
to object mode, rather than attempt and fail at nopython mode.
range() works with all negative arguments (ex: range(-10, -12, -1))
0.14.0
Enhancements:
PR 104: Add binding to get and view function control-flow graph.
PR 210: Improve llvmdev recipe.
PR 212: Add initializer for the native assembly parser.
0.13.4
Features:
Setting and deleting attributes in object mode
Added documentation of supported and currently unsupported numpy ufuncs
Assignment to 1-D numpy array slices
Closure variables and functions can be used in object mode
All numeric global values in modules can be used as constants in JIT
compiled code
Support for the start argument in enumerate()
Inplace arithmetic operations (+=, -=, etc.)
Direct iteration over a 1D numpy array (e.g. "for x in array: ...")
in nopython mode
Fixes:
Support for NVIDIA compute capability 5.0 devices (such as the GTX 750)
Vectorize no longer crashes/gives an error when bool_ is used as return type
Return the correct dictionary when globals() is used in JIT functions
Fix crash bug when creating dictionary literals in object
Report more informative error message on import if llvmpy is too old
Temporarily disable pycc --header, which generates incorrect function
signatures.
0.13.3
Features:
Support for enumerate() and zip() in nopython mode
Increased LLVM optimization of JIT functions to -O1, enabling automatic
vectorization of compiled code in some cases
Iteration over tuples and unpacking of tuples in nopython mode
Support for dict and set (Python >= 2.7) literals in object mode
Fixes:
JIT functions have the same name and doc as the original function.
Numerous improvements to better match the data types and behavior of Python
math functions in JIT compiled code on different platforms.
Importing Numba will no longer throw an exception if the CUDA driver is
present, but cannot be initialized.
guvectorize now properly supports functions with scalar arguments.
CUDA driver is lazily initialized
0.13.2
Features:
vectorize ufunc now can generate SIMD fast path for unit strided array
Updates
Here's a list of all the updates bundled in this pull request. I've added some links to make it easier for you to find all the information you need.
Changelogs
celery 4.0.2 -> 4.1.0
flake8 3.3.0 -> 3.4.1
numba -> 0.34.0