project8 / hornet

Hornet is a nearline data processor for the Project 8 experiment
Other
0 stars 0 forks source link

Prevent traffic between warm and hot #39

Open guiguem opened 7 years ago

guiguem commented 7 years ago

If, for some reason, /data/hot gets full, hornet cannot move data from warm to hot and it starts to panic and spam slack. Hornet should look at the status of hot and decide if it should make the transfer or not, depending on the remaining space. If the space is not enough, it should go on hold and not try to transfer, until the disk space gets below a certain threshold. This might require some of the feature of diopsid about disk monitoring, nothing fancy. Is this feature implemented?

laroque commented 7 years ago

(note for anyone looking and confused, hornet moves data from hot to warm, not warm to hot)

From what I understood of our discussion, the solution to this was in three parts: 1) add the set_condition feature to dripline 2) add a disk-usage monitor which watches data hot and sends a send_condition if it reaches some critical level 3) implement set_condition in the DAQ such that the run can be aborted

If this is done, hornet should never reach a state where it can't move data, since incoming data should stop before the problem occurs (or at worst, there would be a few such messages during the transition). I think we decided against a hold state, but there was discussion of being able to re-queue failed transfers via dripline command (as opposed to needing to kill hornet and queue files from a terminal).

guiguem commented 7 years ago

yes (sorry for the hot/warm), emptying hot (with hornet) goes faster than emptying warm (with dirac), so you are saying that triggering a set_condition on /data/warm should actually be enough to prevent hornet from sending errors, is that correct? If this is true, then there is no point on monitoring hot with the disk monitor and I will be able to simplify the config file of diopsid

nsoblath commented 7 years ago

One could easily imagine something going wrong with hornet that would result in /data/hot filling. We would still want to halt data taking in that case. I think it's still well worthwhile to monitor both hot and warm.

On Thu, Dec 1, 2016, 6:47 PM Mathieu Guigue notifications@github.com wrote:

yes (sorry for the hot/warm), emptying hot (with hornet) goes faster than emptying warm (with dirac), so you are saying that triggering a set_condition on /data/warm should actually be enough to prevent hornet from sending errors, is that correct? If this is true, then there is no point on monitoring hot with the disk monitor and I will be able to simplify the config file of diopsid

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/project8/hornet/issues/39#issuecomment-264359525, or mute the thread https://github.com/notifications/unsubscribe-auth/AAyqJ1kssPdXeh29OEI0NUHqSNez50KPks5rD4angaJpZM4LCIGO .

-- Noah S. Oblath Staff Scientist Radiation Detection & Nuclear Sciences Group Pacific Northwest National Laboratory

Email: noah.oblath@pnnl.gov Phone: 509-375-7207