holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.3k stars 366 forks source link

Issue with datashader when not running as Admin (Windows machine) #1113

Closed harishnreddy closed 1 year ago

harishnreddy commented 2 years ago

Using latest version of datashader v0.14.2 on Windows 10 machine.

Problem exists on prior version of datashader as well (v13.x).

Datashader had been working just fine as local user; however, something changed (probably Windows Security settings?).

Now, datashader is working ok as Admin, but not when running as a local user.

When I run my machine with me being the user, the

import datashader as ds command will fail (log output below).

And when I run the same command as admin, the import will work and the images rendered.

From looking at the log file, it appears that datashader attempts to create a temp directory? And this may be causing the issue? In other words, after a security update on my machine a local user no longer has create directory permissions in certain parts of the machine? Not sure?

Is there a possible remedy for this?

Thanks for any possible help with this.

Harish


PermissionError                           Traceback (most recent call last)

File C:\ProgramData\Anaconda3\lib\tempfile.py:255, in _mkstemp_inner(dir, pre, suf, flags, output_type)

    254 try:

--> 255     fd = _os.open(file, flags, 0o600)

    256 except FileExistsError:

PermissionError: [Errno 13] Permission denied: 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\datashader\\__pycache__\\tmph_jqxe3a'

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)

Input In [4], in <cell line: 1>()

----> 1 import datashader as ds

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\__init__.py:8, in <module>

      5 import param

      6 __version__ = str(param.version.Version(fpath=__file__, archive_commit="$Format:%h$",reponame="datashader"))

----> 8 from .core import Canvas                                 # noqa (API import)

      9 from .reductions import *                                # noqa (API import)

     10 from .glyphs import Point                                # noqa (API import)

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\core.py:18, in <module>

     16 from .utils import Expr # noqa (API import)

     17 from .resampling import resample_2d, resample_2d_distributed

---> 18 from . import reductions as rd

     20 try:

     21     import cudf

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\reductions.py:13, in <module>

     10 from numba import cuda as nb_cuda

     12 try:

---> 13     from datashader.transfer_functions._cuda_utils import (cuda_atomic_nanmin,

     14                                                            cuda_atomic_nanmax)

     15 except ImportError:

     16     cuda_atomic_nanmin, cuda_atomic_nanmmax = None, None

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\transfer_functions\__init__.py:16, in <module>

     13 from PIL.Image import fromarray

     15 from datashader.colors import rgb, Sets1to3

---> 16 from datashader.composite import composite_op_lookup, over, validate_operator

     17 from datashader.utils import nansum_missing, ngjit

     19 try:

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\composite.py:30, in <module>

     24     elif name not in array_operators:

     25         raise ValueError('Operator %r not one of the supported array operators: %s'

     26                         % (how, ', '.join(repr(el[:-4]) for el in array_operators)))

     29 @nb.jit('(uint32,)', nopython=True, nogil=True, cache=True)

---> 30 def extract_scaled(x):

     31     """Extract components as float64 values in [0.0, 1.0]"""

     32     r = np.float64(( x        & 255) / 255)

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\decorators.py:212, in _jit.<locals>.wrapper(func)

    208 disp = dispatcher(py_func=func, locals=locals,

    209                   targetoptions=targetoptions,

    210                   **dispatcher_args)

    211 if cache:

--> 212     disp.enable_caching()

    213 if sigs is not None:

    214     # Register the Dispatcher to the type inference mechanism,

    215     # even though the decorator hasn't returned yet.

    216     from numba.core import typeinfer

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\dispatcher.py:863, in Dispatcher.enable_caching(self)

    862 def enable_caching(self):

--> 863     self._cache = FunctionCache(self.py_func)

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\caching.py:613, in Cache.__init__(self, py_func)

    611 self._name = repr(py_func)

    612 self._py_func = py_func

--> 613 self._impl = self._impl_class(py_func)

    614 self._cache_path = self._impl.locator.get_cache_path()

    615 # This may be a bit strict but avoids us maintaining a magic number

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\caching.py:346, in _CacheImpl.__init__(self, py_func)

    344 source_path = inspect.getfile(py_func)

    345 for cls in self._locator_classes:

--> 346     locator = cls.from_function(py_func, source_path)

    347     if locator is not None:

    348         break

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\caching.py:193, in _SourceFileBackedLocatorMixin.from_function(cls, py_func, py_file)

    191 self = cls(py_func, py_file)

    192 try:

--> 193     self.ensure_cache_path()

    194 except OSError:

    195     # Cannot ensure the cache directory exists or is writable

    196     return

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\caching.py:120, in _CacheLocator.ensure_cache_path(self)

    118 os.makedirs(path, exist_ok=True)

    119 # Ensure the directory is writable by trying to write a temporary file

--> 120 tempfile.TemporaryFile(dir=path).close()

File C:\ProgramData\Anaconda3\lib\tempfile.py:545, in NamedTemporaryFile(mode, buffering, encoding, newline, suffix, prefix, dir, delete, errors)

    542 if _os.name == 'nt' and delete:

    543     flags |= _os.O_TEMPORARY

--> 545 (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)

    546 try:

    547     file = _io.open(fd, mode, buffering=buffering,

    548                     newline=newline, encoding=encoding, errors=errors)

File C:\ProgramData\Anaconda3\lib\tempfile.py:262, in _mkstemp_inner(dir, pre, suf, flags, output_type)

    257     continue    # try again

    258 except PermissionError:

    259     # This exception is thrown when a directory with the chosen name

    260     # already exists on windows.

    261     if (_os.name == 'nt' and _os.path.isdir(dir) and

--> 262         _os.access(dir, _os.W_OK)):

    263         continue

    264     else:

KeyboardInterrupt:

PermissionError Traceback (most recent call last)

File C:\ProgramData\Anaconda3\lib\tempfile.py:255, in _mkstemp_inner(dir, pre, suf, flags, output_type)

254 try:

--> 255 fd = _os.open(file, flags, 0o600)

256 except FileExistsError:

PermissionError: [Errno 13] Permission denied: 'C:\ProgramData\Anaconda3\lib\site-packages\datashader\pycache\tmph_jqxe3a'

During handling of the above exception, another exception occurred:

KeyboardInterrupt Traceback (most recent call last)

Input In [4], in <cell line: 1>()

----> 1 import datashader as ds

File C:\ProgramData\Anaconda3\lib\site-packages\datashader__init__.py:8, in

  5 import param

  6 __version__ = str(param.version.Version(fpath=__file__, archive_commit="$Format:%h$",reponame="datashader"))

----> 8 from .core import Canvas # noqa (API import)

  9 from .reductions import *                                # noqa (API import)

 10 from .glyphs import Point                                # noqa (API import)

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\core.py:18, in

 16 from .utils import Expr # noqa (API import)

 17 from .resampling import resample_2d, resample_2d_distributed

---> 18 from . import reductions as rd

 20 try:

 21     import cudf

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\reductions.py:13, in

 10 from numba import cuda as nb_cuda

 12 try:

---> 13 from datashader.transfer_functions._cuda_utils import (cuda_atomic_nanmin,

 14                                                            cuda_atomic_nanmax)

 15 except ImportError:

 16     cuda_atomic_nanmin, cuda_atomic_nanmmax = None, None

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\transfer_functions__init__.py:16, in

 13 from PIL.Image import fromarray

 15 from datashader.colors import rgb, Sets1to3

---> 16 from datashader.composite import composite_op_lookup, over, validate_operator

 17 from datashader.utils import nansum_missing, ngjit

 19 try:

File C:\ProgramData\Anaconda3\lib\site-packages\datashader\composite.py:30, in

 24     elif name not in array_operators:

 25         raise ValueError('Operator %r not one of the supported array operators: %s'

 26                         % (how, ', '.join(repr(el[:-4]) for el in array_operators)))

 29 @nb.jit('(uint32,)', nopython=True, nogil=True, cache=True)

---> 30 def extract_scaled(x):

 31     """Extract components as float64 values in [0.0, 1.0]"""

 32     r = np.float64(( x        & 255) / 255)

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\decorators.py:212, in _jit..wrapper(func)

208 disp = dispatcher(py_func=func, locals=locals,

209                   targetoptions=targetoptions,

210                   **dispatcher_args)

211 if cache:

--> 212 disp.enable_caching()

213 if sigs is not None:

214     # Register the Dispatcher to the type inference mechanism,

215     # even though the decorator hasn't returned yet.

216     from numba.core import typeinfer

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\dispatcher.py:863, in Dispatcher.enable_caching(self)

862 def enable_caching(self):

--> 863 self._cache = FunctionCache(self.py_func)

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\caching.py:613, in Cache.init(self, py_func)

611 self._name = repr(py_func)

612 self._py_func = py_func

--> 613 self._impl = self._impl_class(py_func)

614 self._cache_path = self._impl.locator.get_cache_path()

615 # This may be a bit strict but avoids us maintaining a magic number

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\caching.py:346, in _CacheImpl.init(self, py_func)

344 source_path = inspect.getfile(py_func)

345 for cls in self._locator_classes:

--> 346 locator = cls.from_function(py_func, source_path)

347     if locator is not None:

348         break

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\caching.py:193, in _SourceFileBackedLocatorMixin.from_function(cls, py_func, py_file)

191 self = cls(py_func, py_file)

192 try:

--> 193 self.ensure_cache_path()

194 except OSError:

195     # Cannot ensure the cache directory exists or is writable

196     return

File C:\ProgramData\Anaconda3\lib\site-packages\numba\core\caching.py:120, in _CacheLocator.ensure_cache_path(self)

118 os.makedirs(path, exist_ok=True)

119 # Ensure the directory is writable by trying to write a temporary file

--> 120 tempfile.TemporaryFile(dir=path).close()

File C:\ProgramData\Anaconda3\lib\tempfile.py:545, in NamedTemporaryFile(mode, buffering, encoding, newline, suffix, prefix, dir, delete, errors)

542 if _os.name == 'nt' and delete:

543     flags |= _os.O_TEMPORARY

--> 545 (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)

546 try:

547     file = _io.open(fd, mode, buffering=buffering,

548                     newline=newline, encoding=encoding, errors=errors)

File C:\ProgramData\Anaconda3\lib\tempfile.py:262, in _mkstemp_inner(dir, pre, suf, flags, output_type)

257     continue    # try again

258 except PermissionError:

259     # This exception is thrown when a directory with the chosen name

260     # already exists on windows.

261     if (_os.name == 'nt' and _os.path.isdir(dir) and

--> 262 _os.access(dir, _os.W_OK)):

263         continue

264     else:

KeyboardInterrupt:

hoxbro commented 2 years ago

I would download a new Anaconda / Miniconda and install it for "just me".

image

ianthomas23 commented 2 years ago

Understanding this needs a brief explanation of the __pycache__ directories. When you load a python module for the first time, python compiles the module to bytecode and stores it in the __pycache__ directory of the directory containing the module, so that in future the module can be loaded and run more quickly. Usually the __pycache__ directory is correctly populated when a python project is installed, so you don't have to worry about it and you only need read access. It is probably possible to get into trouble here if you installed a newer version of a package on top of an old one and didn't include the new __pycache__ files as then they would be out of date and need to be regenerated. This would be very unusual so I think you are probably OK here.

There is a second possible use of the __pycache__ directory which is what is occurring in this situation. datashader uses numba and when you first load a numba-enabled function it is compiled to binary code so that it runs very fast. This is usually just stored in RAM, but you can ask for it to be cached on disk. The clue is in this part of the error message:

except OSError:
     # Cannot ensure the cache directory exists or is writable
     return

By default numba functions are not cached, so either you have enabled this yourself or it has been enabled for you. With the cache enabled, by default it writes to the __pycache__ directory and if you do not have write permission this will fail. You can set the NUMBA_CACHE_DIR environment variable to cache to a different directory, see https://numba.pydata.org/numba-doc/dev/reference/envvars.html#envvar-NUMBA_CACHE_DIR for the options available.

You have 3 options:

  1. Disable numba caching.
  2. Set NUMBA_CACHE_DIR to something else.
  3. Do as @Hoxbro suggests and install your own copy of Anaconda over which you will have full control. Then you can do exactly what you want.
harishnreddy commented 2 years ago

Ok thanks.

That seems to make sense.

As a security 'feature', I am not able to install software on this machine unless I am in Admin mode. I did the install about 2 months back, so my memory may be failing me. But, I did install my upgraded version of datashader as Admin.

From reading the note, it looks like disabling the cache is probably the best solution or at least as the starting point to make sure that this resolves the issue.

I am not very well versed in setting ENVIRONMENT variables in Python, but assume it can be done at the start of the script?

Let me try that approach and see if I can then run the code as a local user.

harishnreddy commented 2 years ago

Hi -

I have googled around and am unsure how to set the environment variables for the NUMBA cache.

I am not sure if this is a 'compile' time or 'run time' configuration nor where to do this?

I tried

@numba.njit(cache = True)

but it returned a Syntax Error: unexpected EOF while parsing

Any possible guidance on the right place to disable or change the cache directory?

Thanks

Disable numba caching. Set NUMBA_CACHE_DIR to something else.

hoxbro commented 2 years ago

1) Follow this guide 2) Add NUMBA_CACHE_DIR to user variables, with a value of one of your local folders. 3) Restart computer.

harishnreddy commented 2 years ago

Thanks.

If I wanted to disable caching do I also need to do this at the WINDOWS ENVIRONMENT level? It seems like disabling NUMBA caching could be done closer to the actual code?

Really want to avoid messing with the SYSTEM variables at the moment if I can. Can accept a downgrade in performance.

There is a second possible use of the pycache directory which is what is occurring in this situation. datashader uses numba and when you first load a numba-enabled function it is compiled to binary code so that it runs very fast. This is usually just stored in RAM, but you can ask for it to be cached on disk. The clue is in this part of the error message:

except OSError:

Cannot ensure the cache directory exists or is writable

return By default numba functions are not cached, so either you have enabled this yourself or it has been enabled for you. With the cache enabled, by default it writes to the pycache directory and if you do not have write permission this will fail.

ianthomas23 commented 2 years ago

@harishnreddy We are trying not to tell you what to do as we don't know your full situation so we are just giving you the options and letting you choose. But as you are floundering a little, I'll be opinionated.

You need write access to your conda installation directories. If you do not have full control over your Python installation you will repeatedly get into trouble. So either (1) fix the permissions of your current Anaconda installation so that you have full access as a normal user, or (2) install a separate version of Anaconda as a normal user and use that. If your employer forces you to have a limited access Anaconda installation then you should consider changing your employer :smile:

If you really cannot do (1) or (2) then set the environment variable NUMBA_CACHE_DIR to be a specific directory of your choice. Do this following the link @Hoxbro sent, at the system level.

ianthomas23 commented 1 year ago

Closing as no action is required.

jbednar commented 1 year ago

Right; I don't see anything that Datashader can do to address this issue that arises from local administrators' policy decisions. Hopefully the workarounds above will make it clear how to move forward. Thanks, all!