scikit-hep / histoprint

Pretty print histograms to the console
MIT License
32 stars 2 forks source link

Support an option to prefix all field strings with provided value #121

Open YeungOnion opened 1 year ago

YeungOnion commented 1 year ago

EDIT: modify title of feature request and modify first post to specify updated request feature request more clear in reply to this post.

Original post

I suppose this is formally a feature request, but it's certainly unexpected that this wouldn't work for ROOT files since we're using uproot.

Issue seems like it's right at the start of the click entry point cli.py:L115 where the target string is loaded with click.open_file(infile) which doesn't handle the colons. Can someone check this regex if I've missed any useful edge cases for the suffix :path/to/objects?

(:[^/\s]+?(\/[^/\s]+)*)$

willing to accept alternatives as well, as this would require importing re.search, perhaps rsplit(':',1) could suffice?

ast0815 commented 1 year ago

Hi, let me just make sure I understand the request. You would like to do to something like this:

$ histoprint /path/to/rootfile:path/to/tree/and/branch:path/to/tree/and/second_branch

Which would be equivalent to:

$ histoprint /path/to/rootfile -f path/to/tree/and/branch -f path/to/tree/and/second_branch

I guess my question is, how would this deal with colons in the file path? Those are valid characters at least on ext4 file systems.

YeungOnion commented 1 year ago

Stepping back, my use case is actually a requirement to re-specify path prefixes at the command line prompt - I've only needed this for ROOT trees, but in the example below I need to repeat the prefix

histoprint -f a/b/c/one -f a/b/c/two -f a/b/c/three filename.root

which I typically do by retyping or shell magics. However, it would be nice to account for the shared prefix and instead pass field flags like, -f one -f two -f three.

While I initially meant the colon syntax to provide that prefixing, I no longer think that's the right solution. The notion of prefix header as raw text also extends to text data as well and would simply prepend onto the fields.

x_Alice,px_Alice,E_Alice,x_Bob,px_Bob,E_Bob

then I could compare Alice and Bob distributions for each parameter with three calls to histoprint,

for p in x_ px_ E_; do
    histoprint --field-prefix=$p -f Alice -f Bob datafile
done

Do you think this is something that would provide value?

If so, I don't know how best this should work as cli parameters? Example of thoughts on use of implementing a multiple or no-repeating option below,

if we implement non-repeating prefix option, I think it could apply to all fields provided

histoprint --field-prefix=a/b/ -f c/one -f c/two filename.root # equiv to below
histoprint -f c/one -f c/two --field-prefix=a/b/ filename.root #equiv to below
histoprint -f a/b/c/one -f a/b/c/two filename.root 

if we allow for multiple options, then I think it should act by order called,

histoprint --field-prefix=a/b/ -f c/one --field-prefix a/ -f b/c/two --field-prefix='' -f x/y/z/final filename.root # equiv to below
histoprint --field-prefix=a/b/c/ -f one -f two --field-prefix=x/y/z/ -f final filename.root # equiv to below
histoprint -f a/b/c/one -f a/b/c/two -f x/y/z/final filename.root

but I'll need to look into how to parse by order in click. Let me know your thoughts!

ast0815 commented 10 months ago

If it provides value to your use case, then it provides value. ;)

I don't think I have time to implement this in the foreseeable future, but I would certainly review it and merge it in, if you find the time to do this.

YeungOnion commented 9 months ago

added as pr #124