This is a pretty complex problem to solve elegantly.
The Problem
iofs is planned to (also) be used with blackheap. Blackheap uses blackbox methodology to provide prediction models to classify I/O requests solely based on their access times. iofs supports loading those models created by blackheap with the --classificationfile parameter.
Of course, this assumes that the latencies initially measured by blackheap are the same ones when running iofs. This is not the case, since it takes some time to classify the I/O requests. Thus, one the one hand, we require iofs to already have classifications to get the realistic overhead. However, on the other hand, we also need to create the classifications on a mounted iofs. Thus, we have a circular dependency.
Why the Trivial Solution won't work
The most obvious solution would be to just create a constant array of dummy classifications that gets evaluated against no matter whether actual classifications are provided or not. Unfortunately, this is not possible since we accept a any amount of models. For example, using the constantlinear model provided by blackheap creates twice the amount of models than the simpler linear model. See the blackheap docs for more.
Ideas
I have two idea, both of which are suboptimal at best.
Idea 1: Provide multiple dummy models, let the user choose
Create dummy CSV files with the amount of models that would be created for each model type used with blackheap. The actual parameters of those models are obviously irrelevant; it only matters that the amount of models (i.e. the number of iterations needed) are correct.
This should be fine, although not very user friendly.
Idea 2: 2 runs, 3 mounts
Just run it twice
Mount iofs with no model
Create a wrong model with blackheap
Remount iofs with the wrong model
Create a correct model with blackheap
Remount iofs again with the correct model
This is the most reliant and secure way since the first model will be more correct than random data. But whether it is actually superior to the first idea is unknown. I dont think so, but I havent tested it yet.
The obvious disadvantage is how long it takes, as we have to create two models.
The current state
The blackheap and iofs documentation notes that this is an open issue
The advice is to use dummy models contained in the repository
It is mentioned that one can also just run it twice with pointing to this repo for further explaination
This is a pretty complex problem to solve elegantly.
The Problem
iofs is planned to (also) be used with blackheap. Blackheap uses blackbox methodology to provide prediction models to classify I/O requests solely based on their access times. iofs supports loading those models created by blackheap with the
--classificationfile
parameter.Of course, this assumes that the latencies initially measured by blackheap are the same ones when running iofs. This is not the case, since it takes some time to classify the I/O requests. Thus, one the one hand, we require iofs to already have classifications to get the realistic overhead. However, on the other hand, we also need to create the classifications on a mounted iofs. Thus, we have a circular dependency.
Why the Trivial Solution won't work
The most obvious solution would be to just create a constant array of dummy classifications that gets evaluated against no matter whether actual classifications are provided or not. Unfortunately, this is not possible since we accept a any amount of models. For example, using the
constantlinear
model provided by blackheap creates twice the amount of models than the simplerlinear
model. See the blackheap docs for more.Ideas
I have two idea, both of which are suboptimal at best.
Idea 1: Provide multiple dummy models, let the user choose
Create dummy CSV files with the amount of models that would be created for each model type used with blackheap. The actual parameters of those models are obviously irrelevant; it only matters that the amount of models (i.e. the number of iterations needed) are correct.
This should be fine, although not very user friendly.
Idea 2: 2 runs, 3 mounts
Just run it twice
This is the most reliant and secure way since the first model will be more correct than random data. But whether it is actually superior to the first idea is unknown. I dont think so, but I havent tested it yet.
The obvious disadvantage is how long it takes, as we have to create two models.
The current state