scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
279 stars 83 forks source link

Option to skip validation in `Workspace.get_measurement` and `Workspace.model`? #1745

Closed lhenkelm closed 2 years ago

lhenkelm commented 2 years ago

Summary

Dear pyhf maintainers, Would you be open to making validation skippable in Workspace.get_measurement and Workspace.model?

I am trying to integrate a user-defined modifier into an existing workspace. To still be able to validate the standard parts of the spec, I wrote a slightly relaxed set of schemas that validate custom modifiers. To switch between the standard and relaxed schemas I gave the relaxed schemas a fake version number. However the Workspace.get_measurement validation tries to use that fake version number in its validation to look up the measurement.json schema in the pyhf package directory. That of course raises a FileNotFoundError since there is no such schema version in pyhf, preventing me from getting a model out of my workspace.

It would be very convenient if Workspace.get_measurement would accept the validate kwarg like Model and Workspace already do on master, and if Workspace.model would forward that option to Workspace.get_measurement.

Additional Information

No response

Code of Conduct

kratsg commented 2 years ago

What does your python calls look like right now for specifying the schema and version? That is pyhf.Workspace(spec, schema='...', version='...') ?

lhenkelm commented 2 years ago

Thanks for the quick reaction! The calls look like this: pyhf.Workspace(spec, validate=False).

lhenkelm commented 2 years ago

PS: to be more clear, the version is only specified in the spec for the workspace, i.e. spec={'version': '1.1.0', 'channels': ..., 'measurements': ..., 'observations': ...}. And for the schema, the implicit name is still 'workspace.json', but I am using a replicated utils.validate prior to initialising the Workspace instance, and in the replica I am looking up schemas in a completely different path in my source, where I have both a schemas/1.0.0/<bla>.json path for the usual schemas, and a separate schemas/1.1.0/<bla>.json path for the relaxed schemas, including one for workspace.json (which gets its guts from a modified defs.json)

kratsg commented 2 years ago

What is the file that it tries to open? If you put everything in schemas/ in a separate path that you're looking up, the rest of the code should still work with that overridden load_schema functionality. So I'm confused by where the crash occurs? get_measurement uses Workspace.version to figure out the version number to load as it needs to be self-consistent

https://github.com/scikit-hep/pyhf/blob/abde6077637cfaca1489e09b97cc2032ac45ce7f/src/pyhf/workspace.py#L389-L390

kratsg commented 2 years ago

(For what it's worth, the #1609 is technically related to this anyway)

lhenkelm commented 2 years ago

The file it tries to look up is venv/lib64/python3.8/site-packages/pyhf/schemas/1.1.0/measurement.json. (so inside the pyhf package installation of my virtual environment) The separate paths with relaxed schemas etc,. don't exist there, but in some version-controlled area alongside the rest of my project source.

kratsg commented 2 years ago

but I thought you modified utils.validate to look in a different path

but I am using a replicated utils.validate prior to initialising the Workspace instance, and in the replica I am looking up schemas in a completely different path in my source, where I have both a schemas/1.0.0/<bla>.json path for the usual schemas, and a separate schemas/1.1.0/<bla>.json path for the relaxed schemas, including one for workspace.json (which gets its guts from a modified defs.json)

so I'm a little confused about why this would go back to the pkg_resources variation here, if you swapped out utils.validate.

e.g.

def _myvalidate(....):
  ...

import pyhf
pyhf.utils.validate = _myvalidate

...
lhenkelm commented 2 years ago

Oh, sorry, I don't think I made that clear. That function is also outside pyhf (not a monkey-patch or anything). It's just an adapated copy of the source from pyhf.utils.validate, where I try to look up the schema in the pyhf schemas, and if that fails, fall back to those separate files.

lhenkelm commented 2 years ago

So Workspace goes looking in pkg_resources because it is still using the standard pyhf.validate implementation. I am just ensuring the validity of the specs by calling the home-brew validate first (by hand in the script)

lhenkelm commented 2 years ago

i.e. in effect the code does the equivalent of this:

import pyhf

def my_validate(...):
    ...
    try:
        pyhf.utils.validate(....) # ok if Model or Measurement or Workspace and version == '1.0.0'
    except FileNotFoundError as e: # either the schema is not exposed, or the version is fake
        # re-implementation of validate that refers to my custom paths
spec = make_a_spec(...)
my_validate(spec) # now I trust it
ws = pyhf.Workspace(spec, validate = False) # works
model = ws.model(validate = False) # FileNotFoundError
kratsg commented 2 years ago

Can you just try instead to do something like

import pyhf

_validate = pyhf.utils.validate
def my_validate(...):
  ...
  try:
    _validate(....)
  except:
    ...

spec = make_a_spec(...)
my_validate(spec) # now I trust it
ws = pyhf.Workspace(spec, validate = False) # works
model = ws.model(validate = False) # FileNotFoundError
lhenkelm commented 2 years ago

That still gives the same exception. Is there something missing in the code example? The only difference is it re-binds pyhf.utils.validate to a new name.

kratsg commented 2 years ago

It should rebind pyhf.utils.validate so that get_measurement() still calls your internal function correctly. The other option is just to do

import pyhf

pyhf.utils.validate = lambda *args, **kwargs: True

at least until the linked issue gets implemented and we don't need the sort of hacky solutions like this.

lhenkelm commented 2 years ago

oooh I see what you meant. Then its just the final re-bind of my_validate back to pyhf.utils.validate that is missing. That does work (although it means I will have to be very careful about remembering the state of pyhf.utils.validate). But at least I can keep going in the meantime :)

kratsg commented 2 years ago

How about for now I close this issue since technically, this will get solved when we have utils.validate supporting user-defined schemas and this is trackable via #1609?

lhenkelm commented 2 years ago

Absolutely, feel free to close this.
Implementing #1609 would be at least as good a solution in my book. (and let me write less code ^^). Also, thank you very much for the prompt and in-depth feedback!