inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

Pipelining with luigi #96

Closed OPersian closed 6 years ago

OPersian commented 7 years ago

Hi!

I would like to create the pipeline for the given examples with luigi Prior to elaborate on it, it'd be great to have your feedback. @MSusik Would you be interested in adding the pipeline to the examples? One of the jobs will look like this:

class BuildDistanceModel(luigi.Task):

    blocking = luigi.BoolParameter()

    def requires(self):
        return PrepareDataSets()

    def output(self):
        return luigi.LocalTarget(path_to_distance_model)

    def run(self):
        X = unpickle(self.input()[0]) 
        y = unpickle(self.input()[1])

        estimator = build_distance_estimator(
            X=X, y=y,
            classifier=classifier)

        pickle(estimator, self.output())

Having the batch jobs of the disambiguation pipeline may facilitate comprehension of examples, probably ease their launch (at least, one would be required to define fewer input parameters), and demonstrate how BEARD may work with pipelining libs.

Thanks in advance!

MSusik commented 6 years ago

Sorry for no feedback, I overlooked this issue. We use pipelines from sklearn, and we would like not to add any redundant dependencies. Thus, we will not add a luigi example.