greenelab / connectivity-search-analyses

hetnet connectivity search research notebooks (previously hetmech)
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Archive creation does not support pathlib paths #126

Closed zietzm closed 6 years ago

zietzm commented 6 years ago

When feeding paths into the HetMat archiving functions, it would be very helpful if pathlib paths were supported inputs, as opposed to just strings.

all_paths_1 = hetmat.metagraph.extract_all_metapaths(1)
paths_1 = []
for path in all_paths_1:
    path = pathlib.Path(f'path-counts/dwpc-0.5/{path}.sparse.npz')
    if path.exists():
        paths_1.append(path)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-49cc11dbbf57> in <module>()
      1 hetmech.hetmat.archive.create_archive_by_globs(
----> 2     'dwpc-0.5-len-2.zip', '.', include_globs=paths_2)

~/Documents/hetmech/hetmech/hetmat/archive.py in create_archive_by_globs(destination_path, root_directory, include_globs, exclude_globs, include_paths, **kwargs)
     35     source_paths = set()
     36     for glob in include_globs:
---> 37         source_paths |= set(root_directory.glob(glob))
     38     for glob in exclude_globs:
     39         source_paths -= set(root_directory.glob(glob))

~/.conda/envs/hetmech/lib/python3.6/pathlib.py in glob(self, pattern)
   1072             raise ValueError("Unacceptable pattern: {!r}".format(pattern))
   1073         pattern = self._flavour.casefold(pattern)
-> 1074         drv, root, pattern_parts = self._flavour.parse_parts((pattern,))
   1075         if drv or root:
   1076             raise NotImplementedError("Non-relative patterns are unsupported")

~/.conda/envs/hetmech/lib/python3.6/pathlib.py in parse_parts(self, parts)
     60             if altsep:
     61                 part = part.replace(altsep, sep)
---> 62             drv, root, rel = self.splitroot(part)
     63             if sep in rel:
     64                 for x in reversed(rel.split(sep)):

~/.conda/envs/hetmech/lib/python3.6/pathlib.py in splitroot(self, part, sep)
    281 
    282     def splitroot(self, part, sep=sep):
--> 283         if part and part[0] == sep:
    284             stripped_part = part.lstrip(sep)
    285             # According to POSIX path resolution:

TypeError: 'PosixPath' object does not support indexing

Whereas simply changing the append to be a string works correctly.

all_paths_1 = hetmat.metagraph.extract_all_metapaths(1)
paths_1 = []
for path in all_paths_1:
    path = pathlib.Path(f'path-counts/dwpc-0.5/{path}.sparse.npz')
    if path.exists():
        paths_1.append(str(path))
dhimmel commented 6 years ago

Looks like you're hitting the error in this line:

https://github.com/greenelab/hetmech/blob/240fadb2eaa8d6ed7401d9fb71bd48b20b67d7df/hetmech/hetmat/archive.py#L36-L37

A glob is a pattern and there is no pathlib instantiation of a glob

Passing str or pathlib.Path objects to include_paths should work:

https://github.com/greenelab/hetmech/blob/240fadb2eaa8d6ed7401d9fb71bd48b20b67d7df/hetmech/hetmat/archive.py#L41