umami-hep / atlas-ftag-tools

Python tools for ATLAS FTAG
2 stars 17 forks source link

add_attribute_copy #20

Closed longwills closed 1 year ago

longwills commented 1 year ago

Summary

This pull request introduces the following changes

Relates to the following issues

*https://github.com/umami-hep/atlas-ftag-tools/issues/19

Conformity

codecov-commenter commented 1 year ago

Codecov Report

Merging #20 (000f53a) into main (c45489f) will decrease coverage by 0.43%. The diff coverage is 50.00%.

@@            Coverage Diff             @@
##             main      #20      +/-   ##
==========================================
- Coverage   94.40%   93.97%   -0.43%     
==========================================
  Files          24       24              
  Lines        1036     1046      +10     
==========================================
+ Hits          978      983       +5     
- Misses         58       63       +5     
Impacted Files Coverage Δ
ftag/vds.py 70.31% <50.00%> (-3.77%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

longwills commented 1 year ago

Hi @samvanstroud I did some change but it's failing this pre-commit (pull request), do you know why?

By the way to handle the case when there are different names of attributes from different files, I did some thing like:

            for fname in fnames:
                 with h5py.File(fname) as g:
                      for name, value in g[group].attrs.items():
                           attrs_dict[name] = value
            for name, value in attrs_dict.items():
                 f[group].attrs[name] = value

But this will simply overwrite the attributes with same names but different contents. Do you have any suggestion of handling attributes with same names but different contents, e.g. renaming then or combining the contents?

samvanstroud commented 1 year ago

Thanks a lot for the update @longwills! In terms of implementation, this is probably fine, but a slightly safer option was outlined above, namely to only add attrs which are identical across all files. Something like below should do the trick:

attrs_dict = {}
for file in files:
    for name, value in g[group].attrs.items():
        if name not in attrs_dict:
            attrs_dict[name] = []
        attrs_dict[name].append(value)

for k, v in attrs_dict.items():
    if len(set(v)) == 1:
        out_file.attrs[k] = list(set(v))[0] 

In terms of the CI, you can check the "Linting" action details, it looks like a formatting issue from Black. To automatically fix the problem locally, you could set up pre-commit hooks locally with

pre-commit install
pre-commit run --all-files

which will install and apply all the necessary linting checks locally :)

longwills commented 1 year ago

Hi @samvanstroud I have implemented the change. But since numpy.ndarray is an unhashable type, I cannot use set() function to squeeze the array. Currently I'm using directly the array. Let me know if you find some better solutions

samvanstroud commented 1 year ago

Ah yeah that's annoying. Ok for now the currently implementation works :) could you please add a changelog entry? then I think we will be good to merge!

longwills commented 1 year ago

Ah yeah that's annoying. Ok for now the currently implementation works :) could you please add a changelog entry? then I think we will be good to merge!

longwills commented 1 year ago

Hi @samvanstroud I just updated the changelog. Please check it out

samvanstroud commented 1 year ago

@longwills unfortunately I don't see it! can you make sure you have pushed?

longwills commented 1 year ago

Hi @samvanstroud I updated it again. Can you see it now?

samvanstroud commented 1 year ago

Great, thanks very much! Merged :)