accessor for _named_fields

martinResearch commented 4 years ago

Hi, I would like to be able to access to the list of names in the private member _named_fields. I can directly access to this member but it is brittle because it is a private member that could change or be renamed in future releases of the library. Could we add an official accessor for it that returns a copy of this list ?

r1chardj0n3s commented 4 years ago

I am looking back at the code and I have regrets about merging that PR. There's more complication around the _named_fields attribute in particular.

What is the use-case for exposing those properties?

martinResearch commented 4 years ago

that is unfortunate :( what are this complications ? I am building a class to parse a large set of file names with different f-string formats and create for each format a dictionary that maps named tuples (using the name fields as names) to file names that matches. I would like to be able to anticipate the list of named fields before parsing the first file name. I could use a lazy approach and wait for the first file name to be parse but it makes my class less easy to use.

r1chardj0n3s commented 4 years ago

The complications I refer to are that the _named_fields attribute is very much an internal field that is further processed to produce the API-exposed parsed fields. This could be very confusing, and it restricts future potential refactorings.

I feel like there's got to be a better way to provide the functionality you need than exposing an internal implementation detail. Let me think over your specific requirement, though it reads as something horribly complicated, so I'm not really sure I follow :(

martinResearch commented 4 years ago

Here is a minimal implementation of my class using a single f-string format.

import parse
import os 

def list_files_recursive(root):
    for root_folder, _, files in os.walk(root):
        subfolder = os.path.relpath(root_folder, root)
        if subfolder == ".":
            subfolder = ""
        for file in files:
            yield os.path.join(subfolder, file)

class Dataset():
    def __init__(self, formatstr):
        self._format = formatstr
        self._compiled_parser = parse.compile(self._format)
        if len(self._compiled_parser._fixed_fields)>0:
            raise BaseException("does not support fixed fields")
        self._dimensions = tuple(self._compiled_parser._named_fields)
        self._files_names = {}

    def dims(self):
        return self._dimensions

    def add_file(self, root, filename):
        parsed = self._compiled_parser.parse(filename)
        if parsed is not None:                
            coords = tuple([parsed[dim_name] for dim_name in self._dimensions])     
            if coords in  self._files_names:
                raise BaseException(f"File for coords {coords} already exists: {self._files_names[coords]}")
            self._files_names[coords] = os.path.join(root,filename)       

    def add_folder(self, root):        
        for filename in list_files_recursive(root):              
            self.add_file(root,filename) 

if __name__== "__main__":
    dataset = Dataset("frame_{lattitude:d}_{longitude:d}_{day:d}_{hour:d}.png")
    print(dataset.dims())
    dataset.add_folder("C:\dataset")

I could wait for the first file to be added to populate the member self._dimension, but it makes it less elegant and potential make my class less easy to use. I theory I don't see why we could not get the list of "dimensions" before starting parsing the first file. Would it make sense to assume that, given a some f-string format, the list of names of the named fields extracted from the f-string will always be the same thus would not change with future refactoring ? Or am I missing something ?

r1chardj0n3s commented 3 years ago

Ah, stuff it. Let's see how much trouble there really is :-)

r1chardj0n3s / parse

accessor for _named_fields #114