templateflow / python-client

A python client to query TemplateFlow via pyBIDS
https://templateflow.org/python-client/
Apache License 2.0
8 stars 12 forks source link

FIX: Avoid directory clobber during zip extraction #131

Closed mgxd closed 6 months ago

mgxd commented 6 months ago

Closes #62

This shifts skeleton extraction to a per-file basis to catch cases where directories were being created across multiple threads/procs. Additionally, this sets the groundwork for verifying individual file checksums down the line.

Python 3.13 already addressed this - https://github.com/python/cpython/commit/5d2794a16bc1639e6053300c08a78d60526aadf2

bendhouseart commented 6 months ago

@mgxd I've been running into issue #62 but only when I run templateflow within a NiPype pipeline from inside of a container and was hopeful that this would fix my issue. However, I'm still getting the same FileExistsError despite running mgxd:fix/ip-extract-race-condition.

Would be happy to work with you on getting this merged/testing or helping to provide a fix as I'm a bit stuck on containerizing a bids-app until this works.

mgxd commented 6 months ago

@bendhouseart thanks for giving it a go - are you seeing the same traceback as #62?

bendhouseart commented 6 months ago

Yes, see:

Traceback:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 397, in run
        runtime = self._run_interface(runtime)
      File "/usr/local/lib/python3.9/site-packages/nipype/interfaces/utility/wrappers.py", line 142, in _run_interface
        out = function_handle(**args)
      File "<string>", line 4, in create_weighted_average_pet
      File "/usr/local/lib/python3.9/site-packages/niworkflows/interfaces/bids.py", line 52, in <module>
        import templateflow as tf
      File "/usr/local/lib/python3.9/site-packages/templateflow/__init__.py", line 40, in <module>
        from . import api
      File "/usr/local/lib/python3.9/site-packages/templateflow/api.py", line 30, in <module>
        from templateflow.conf import (
      File "/usr/local/lib/python3.9/site-packages/templateflow/conf/__init__.py", line 75, in <module>
        _init_cache()
      File "/usr/local/lib/python3.9/site-packages/templateflow/conf/__init__.py", line 72, in _init_cache
        _update_s3(TF_HOME, local=True, overwrite=TF_AUTOUPDATE, silent=True)
      File "/usr/local/lib/python3.9/site-packages/templateflow/conf/_s3.py", line 41, in update
        retval = _update_skeleton(skel_file, dest, overwrite=overwrite, silent=silent)
      File "/usr/local/lib/python3.9/site-packages/templateflow/conf/_s3.py", line 104, in _update_skeleton
        zipref.extract(fl, path=dest)
      File "/usr/local/lib/python3.9/zipfile.py", line 1637, in extract
        return self._extract_member(member, path, pwd)
      File "/usr/local/lib/python3.9/zipfile.py", line 1704, in _extract_member
        os.mkdir(targetpath)
    FileExistsError: [Errno 17] File exists: '/.cache/templateflow/tpl-MNI152Lin'

Yeah, but what's weird to me is that this error only appears for me when I run it in docker using NiPype with nprocs > 1. I didn't encounter it as an issue before then. Hopefully that's a clue?

mgxd commented 6 months ago

I'd like to add a test to replicate the problem in the CI.

@bendhouseart if you try with the this branch's latest commit (b3b0846) does it solve the problem?

bendhouseart commented 6 months ago

Changes as of b3b0846 fixed it, my pipeline runs just fine in docker w/ out any errors now. Fixed additional bug that would lead to different outcomes when running as user vs. root too.

Output is now identical between these two:

run_docker_2024-04-03_13:59_as_user_4.log run_docker_2024-04-03_13:59_as_root_4.log

I think it's fixed ;)