wietlabs / krk_meetings

A connection search engine with built-in group meetings planner. Uses GTFS Static and GTFS Realtime data published by Kraków Public Transport Authority.
3 stars 1 forks source link

Remove ExtractedData and refactor extractors #82

Closed tomekzaw closed 3 years ago

tomekzaw commented 4 years ago

Suppose we need to support n different data formats and m different solvers.

Parser is responsible for transforming a timetable in specific format (e.g. GTFS Static) into a ParsedData object, which is a part of common interface.

Instances of ParsedData can be easily merged using Merger or transformed in some other way.

Extractors are responsible for generating solver-specific data basing on parsed data.

For every approach to solve a problem there should exist:

Note that extractors' outputs for two solvers may have different fields, thus no single ExtractedData structure can be defined as a part of common interface.

Commonly used dataframes, graphs or numeric values, for example:

should be extracted using supporting functions (or CustomExtractor methods).

Custom extractors should implement extract(self, parsed_data: ParsedData) -> CustomData method which returns an appropriate data object.

Extractors can be configured within __init__ method (if necessary) using keyword-only arguments with default values preferably.

class MagicExtractor:
    def __init__(self, *, magic_coeff: float = 7.0):
        self.magic_coeff: float = magic_coeff

    def extract(self, parsed_data: ParsedData) -> MagicSolverData:
        G = self._extract_magic_graph(parsed_data)
        G2 = self._extract_another_magic_graph(parsed_data)
        ...
        return MagicSolverData(G, G2, ...)

    def _extract_magic_graph(self, parsed_data: ParsedData) -> nx.DiGraph:
        G = nx.DiGraph()
        ...
        return G

All extractors must be state-less since they can be used multiple times.

Example usage:

parsed_data = ...

extractor = MagicExtractor(magic_coeff=7.0)
magic_solver_data = extractor.extract(parsed_data)
solver = MagicSolver(magic_solver_data, allow_black_magic=True)

query = ...
solver.find_connection(query)
tomekzaw commented 3 years ago

Mostly done.