Load/run only the preprocessors required by the requested features

lefterav commented 11 years ago

Currently all default preprocessors are loaded, even if some of them are not required by the requested features. This causes major delays (e.g. if parsing occurs without being needed). Some design changes would be required, in order to let each one of the feature classes to specify which preprocessors are required before their execution

lefterav commented 11 years ago

We have a version for this in the branch resource-manager. We need to test this shortly and then it will be ready to merge with master.

lefterav commented 11 years ago

I just finished a first pass on re-designing the execution of ResourceProcessors. ResourceProcessors are not any more initialized and executed in a raw way from the FeatureExtractor.java The initialization functions who were in the FeatureExtractor.java have been moved to shef.mt.pipelines.DefaultResourcePipeline The superclass ResourcePipeline can now receive a list of the required resourceNames and fires only the ResourceProcessors that are define with this resourceName. ResourceProcessors that want to be compatible with this, should have the this.resourceName class variable set with a resource name (i.e. "bparser' etc) The only ResourceProcessors who were executed by the existing FeatureExtractor were BParser and TopicModelling, so these are the only ones that have been modified to work with the pipeline system.

TODO: the above mentioned solution only avoids RUNNING the ResourceProcessors. ResourceProcessors should actually not be initialized at all (i.e. grammars and tables should not be loaded). This actually requires adding a separate "initialize" function to each of the resourceProcessors, or implementing some kind of pythonic dynamic class loading which is tricky in Java.

lefterav commented 11 years ago

According to the proposed design, every tool that implements the ResourceProcessor interface, will have one additional obligatory function, called initialize. This function will have ONLY one parameter, the PropertiesManager, which is an object that holds all parameters read from the user's customized .properties.

Each resource processor will be now responsible in its own class to acquire the parameters they need for their initialization, by directly asking the PropertiesManager for them.

This will solve the problem, that the resource processors had to be initialized "hard-coded" one by one in the FeaturesExtractor.java since each of them had different initialization parameters.

This will also require that we modify the existing processors by moving their initialization code from the FeatureExtractor (or the Pipeline) back to the Processor class.

I hope you approve this change

lspecia / quest

Load/run only the preprocessors required by the requested features #5