teemtee / fmf

Flexible Metadata Format
GNU General Public License v2.0
22 stars 28 forks source link

sources are bad in cases of definition of data in upper level #138

Open jscotka opened 3 years ago

jscotka commented 3 years ago

Hi, I've met the issue, that fmf sources are sometimes bad:

.
├── a.fmf
└── main.fmf

$ cat main.fmf 
/a:
  /b:
    d: 2
  /x:
    d: 3

$ cat a.fmf 
/b:
  c: 1
  d: 1

fmf show --debug produces:

/a/b
c: 1
d: 1
/home/jscotka/git/fmf/tmp/main.fmf
/home/jscotka/git/fmf/tmp/a.fmf

/a/x
d: 3
/home/jscotka/git/fmf/tmp/main.fmf
/home/jscotka/git/fmf/tmp/a.fmf

so that item /a/x is not defined in a.fmf anyhow, just in main.fmf so from that perspective sources in this case is just potential place where you can put some data, but it is not real place where any data of node are defined.

I understand that this example is little bit artifical but clean to understand the issue.

It had bigger impact especially to modification or using plugins where there are then allowed to use eg. a.py and a.fmf files as sources of data. (I wanted to use a.fmf for modification usage in case there is not defined method in plugin to modify original source. so that It could be relatively easy override it by new fmf file but I've found that when I create it, it will be applied whenever there are no definition for some nodes.

Also bad is, that it may be little bit worse to user to uderstand where data are defined, in case it shows also files where there is no definition of node. In this case it shows the top priority file, where you can define data, but no real data here.

FIX

fix is theoretically easy, just pass through source paramater to Tree init, and add it just in case there are any data nof node instead of adding sources when grow and child and merge methods but practically implementation could be hard because of need complex refactoring. Potentially it may lead also to more readable code.

psss commented 3 years ago

To sum up the issue: So currently the list of sources includes all fmf files which where inspected during metadata gathering, including those which did not contribute any real data but are part of the hierarchy from the metadata root. And @jscotka is proposing to omit from the list any parent node which does not contribute metadata.

I understand the use case, on the other hand I also find it useful to be able to see all fmf files which were used to make up the aggregated metadata (for example to quickly find the best place where to add a new common key shared among multiple children). Not sure how frequent such a use case might be though. But still it feels that including the whole hierarchy path make sense.

We could modify current behaviour (I guess that omitting the "empty" parents will not break any functionality, but one never knows...) or we could make available both lists. @lukaszachy, @FrNecas, thoughts?

jscotka commented 3 years ago

as you said, both functionality makes sense

I can imagine e.g. just instead of print self.sources trace thought parents up to root and print sources, and self.srouces. will contain just files with any data. like:

def print_sources:
  item = node
  all_files = list()
  while True:
      all_files += item.sources
      if item.parent is None:
 return list(set(all_sources))

so that debug print of source files will be same, and also self.sources will contain just files with defined node data. although empty one like e:

finally there are maybe 3 usefu cases: