Closed Horsmann closed 5 years ago
@Rentier Training should run async now. If training runs, additionally incoming requests just slips through with the 'too many job' return code.
Nice. Did you try it with INCEpTION?
No, I am actually not sure what I have to setup. Once this TomCat stuff is running how do I reach a setup where n recommender is actually used for something? I probably have to define a project of some sort?
I tested with curl
.
You can run INCEpTION.java
in inception-app-webapp or so, no need for Tomcat. Then you create a project, import files and define a recommender in the project settings.
Following problem came up, which I cannot directly solve.
I use CrfSuite
as backend at the moment. Apparently, crfsuite does not like it if it is called simultaneously for training and prediction. The pipes break and you have a good chance that prediction and training fail together if this happens simultaneously. This origins probably in the way the binary is implemented. I vaguely recall similar issues in the past. As long as you ensure a single call at a time this seems to works. Simultaneous predictions seem to work.
@reckart I assume the RuntimeProvider
is smart enough to figure out that a binary is already available on the file system and just re-uses it, right? Thus, when training is running and a prediction request comes in, the same binary will be picked by the runtime provider that is already used for training?
Is the RuntimeProvider configurable in a way that each request can be served with an own copy of the binary? Maybe this helps, I am not sure but this is something I could try. At the moment, the i/o streams of the binary crash when train/pred occur together.
@Rentier As quick-fix, I could let the prediction return an temporarily not available
code in order to wait the time the model is training. This would also mean that if there is actually a lot request-traffic for both, train and prediction it might take some time until a process catches a free spot for getting served .
There is also no really well suited sequence classification alternative in TC. SvmHmm does do sequence classification and VowpalWabbit, the former scales poorly and the latter does not reach state of the art results.
I might have found a solution but this requires an upgrade to the latest TC snapshot.
An issue seems to remain; a race condition. When the model has finished training and writes to disc and a prediction request comes in that requests the model that is being written we get a problem. I still have to look into this one.
@Rentier what is the best behavior if a model is requested that is not available either because it does not exist or is currently being trained? Return with no prediction or a try-again-later return code?
I will have to add some additional logic to deal with requests for models that are being (re)trained. Question is should I rather wait for training to finish our bail-out early and just return with nothing?
@reckart I assume the RuntimeProvider is smart enough to figure out that a binary is already available on the file system and just re-uses it, right?
Once install()
has been called, additional calls to install()
have no effect unless uninstall()
is called in between. If you want every request to use its own copy of binaries, you just have to create a new instance of the RuntimeProvider. Don't forget to call uninstall()
when you don't need the runtime anymore, otherwise they will accumulate on disk.
@Horsmann I would return a 412 or so and bail out. For the case of retraining: is it possible to use the old model while the training is not finished? When its finished, then you could replace the old with the new model.
For the case of retraining: is it possible to use the old model while the training is not finished? When its finished, then you could replace the old with the new model.
That is a good idea 👍
ok; so prediction might now return a PRECONDITION_FAILED
if
a) either a new model is written to disc in this moment, regardless if the prediction tries to use this model or not. During disk-write no predictions are served (this should take no very long a second or two maybe in which the old model is removed and replaced with the new one).
b) no model is available that provides prediction for the requested information/layer
@Rentier You probably have to retrain the models. Moving up to the snapshot probably has some changes somewhere that lets old models work no more~
@reckart Is there a reason why UKP's jenkins fails all the time while ours builds without problems?
@Horsmann I'm sure there is, but I don't know what the reason is (I didn't investigate). Try adding more logging to see what happens.
@Rentier Inception seems to basically work.
What I noticed, even for zero-annotations, inception sends training requests; if there is no data annotated yet inception shouldn't ask for training. I can catch this on my side but I would have to prematurely deserialize the CAS and check if there is anything in there for training. Could you prevent train requests without actual content?
I will look into that.
Thanks. Otherwise this looks goods :).
I increased the number of log messages a bit but this should work now. Once this is merged, we can close this issue.