nel-lab / mesmerize-core

High level pandas-based API for batch analysis of Calcium Imaging data using CaImAn
Other
61 stars 15 forks source link

Use Path.resolve() consistently to prevent UNC/drive letter ambiguity #304

Open ethanbb opened 5 months ago

ethanbb commented 5 months ago

This fixes an issue I had on Windows where pathlib reported that one or more output paths in CNMF did not have a common root with the output dir. This is because the file paths were normalized with Path.resolve(), whereas the directory path was not. On Windows, resolve() converts any remote mount paths from drive letter to UNC format, and in order to compute the relative path, both paths must be in the same format.

This change calls resolve() on the directory path before computing the other paths relative to it. (Another solution could be to just create these relative paths directly and concatenate them to the dir path when saving data.) I also changed the paths.split function, which also calls relative_to, to always resolve both paths before trying to compute the relative path.

ethanbb commented 4 months ago

@kushalkolar Here's an example of the problem, using some of the existing code from cnmf.py (sorry for the delay). The resolve method converts drive paths to UNC paths if the drive is mapped to a network share on the current system. Then, if you call relative_to on such a path with a drive letter path as the argument, you get an error. Calling resolve on the common root path (output_dir) and defining all the other paths relative to this one should prevent this issue.

I'm not sure it can be tested automatically; would definitely require some changes to the Windows test runner to set up this situation.

In [1]: from pathlib import Path

In [2]: drive_path = Path('Z:\\foo\\bar')

In [3]: output_dir = drive_path.parent.joinpath('baz')

In [4]: output_dir
Out[4]: WindowsPath('Z:/foo/baz')

In [5]: output_path = output_dir.joinpath('baz.hdf5').resolve()

In [6]: output_path
Out[6]: WindowsPath('//proektdata/LabData/foo/baz/baz.hdf5')

In [7]: output_path.relative_to(output_dir.parent)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 output_path.relative_to(output_dir.parent)

File ~\AppData\Local\miniforge3\envs\mesviz\lib\pathlib.py:818, in PurePath.relative_to(self, *other)
    816 if (root or drv) if n == 0 else cf(abs_parts[:n]) != cf(to_abs_parts):
    817     formatted = self._format_parsed_parts(to_drv, to_root, to_parts)
--> 818     raise ValueError("{!r} is not in the subpath of {!r}"
    819             " OR one path is relative and the other is absolute."
    820                      .format(str(self), str(formatted)))
    821 return self._from_parsed_parts('', root if n == 1 else '',
    822                                abs_parts[n:])

ValueError: '\\\\proektdata\\LabData\\foo\\baz\\baz.hdf5' is not in the subpath of 'Z:\\foo' OR one path is relative and the other is absolute.
kushalkolar commented 2 months ago

I think let's wait until #300 so that CI works and then rebase to make sure this works on all platform before merging

ethanbb commented 2 months ago

300 is merged, did you mean a different one?