Closed martinResearch closed 3 years ago
I am looking back at the code and I have regrets about merging that PR. There's more complication around the _named_fields attribute in particular.
What is the use-case for exposing those properties?
that is unfortunate :( what are this complications ? I am building a class to parse a large set of file names with different f-string formats and create for each format a dictionary that maps named tuples (using the name fields as names) to file names that matches. I would like to be able to anticipate the list of named fields before parsing the first file name. I could use a lazy approach and wait for the first file name to be parse but it makes my class less easy to use.
The complications I refer to are that the _named_fields attribute is very much an internal field that is further processed to produce the API-exposed parsed fields. This could be very confusing, and it restricts future potential refactorings.
I feel like there's got to be a better way to provide the functionality you need than exposing an internal implementation detail. Let me think over your specific requirement, though it reads as something horribly complicated, so I'm not really sure I follow :(
Here is a minimal implementation of my class using a single f-string format.
import parse
import os
def list_files_recursive(root):
for root_folder, _, files in os.walk(root):
subfolder = os.path.relpath(root_folder, root)
if subfolder == ".":
subfolder = ""
for file in files:
yield os.path.join(subfolder, file)
class Dataset():
def __init__(self, formatstr):
self._format = formatstr
self._compiled_parser = parse.compile(self._format)
if len(self._compiled_parser._fixed_fields)>0:
raise BaseException("does not support fixed fields")
self._dimensions = tuple(self._compiled_parser._named_fields)
self._files_names = {}
def dims(self):
return self._dimensions
def add_file(self, root, filename):
parsed = self._compiled_parser.parse(filename)
if parsed is not None:
coords = tuple([parsed[dim_name] for dim_name in self._dimensions])
if coords in self._files_names:
raise BaseException(f"File for coords {coords} already exists: {self._files_names[coords]}")
self._files_names[coords] = os.path.join(root,filename)
def add_folder(self, root):
for filename in list_files_recursive(root):
self.add_file(root,filename)
if __name__== "__main__":
dataset = Dataset("frame_{lattitude:d}_{longitude:d}_{day:d}_{hour:d}.png")
print(dataset.dims())
dataset.add_folder("C:\dataset")
I could wait for the first file to be added to populate the member self._dimension, but it makes it less elegant and potential make my class less easy to use. I theory I don't see why we could not get the list of "dimensions" before starting parsing the first file. Would it make sense to assume that, given a some f-string format, the list of names of the named fields extracted from the f-string will always be the same thus would not change with future refactoring ? Or am I missing something ?
Ah, stuff it. Let's see how much trouble there really is :-)
Hi, I would like to be able to access to the list of names in the private member
_named_fields
. I can directly access to this member but it is brittle because it is a private member that could change or be renamed in future releases of the library. Could we add an official accessor for it that returns a copy of this list ?