My "least surprise" order of expectations is as follows:
apply the logic based on the first match in the types list, order being that of the config file
apply logic based on all types that it matches
apply the logic based on the first match of its internal list which may or may not match the config
Use case
I'd like to have hornet process all egg files produced during some data taking. The "correct" file format should match r"runid[0-9]{9}_[0-9]{9}.egg" and this is enforced for data taken "correctly" (ie using running dripline services). If a user takes data incorrectly (maybe doing a test by hand) and a file named pre_data_testing_0001.egg were to show up, perhaps I'd like the following to happen:
hornet sends an alert out over AMQP (and/or to slack) that someone has created an improperly named egg file
do nearline analysis on the file and move it to warm storage
possibly don't move it to cold storage because it is ill defined
More general question as a result of thinking through this while typing it out...
This is becoming a ramble, but I guess I should also ask exactly how to do configuration in a truly consistent way. This should probably be broken down into dedicated feature-request issues once it is more flushed out. I'm a bit confused by:
In the case of doing hashes, we set classifier.types[i].do-hash == true, then configure a "hash" module (ie it is on or off per file type, there is one hash command, configured in the module for all hashing).
In the case of sending file info we set classifier.send-file-info == true (so this applies to all file types).
In the case of "working" we add to the worker.jobs array, a map with key "file-type" that matches a file type from the classifier and define a command for the shell, it isn't clear that we have a bool to toggle a particular job for a particular file type, or if we can have multiple jobs for a particular file type, etc.
My "least surprise" order of expectations is as follows:
Use case
I'd like to have hornet process all egg files produced during some data taking. The "correct" file format should match r"runid[0-9]{9}_[0-9]{9}.egg" and this is enforced for data taken "correctly" (ie using running dripline services). If a user takes data incorrectly (maybe doing a test by hand) and a file named pre_data_testing_0001.egg were to show up, perhaps I'd like the following to happen:
More general question as a result of thinking through this while typing it out...
This is becoming a ramble, but I guess I should also ask exactly how to do configuration in a truly consistent way. This should probably be broken down into dedicated feature-request issues once it is more flushed out. I'm a bit confused by: