Remove ExtractedData and refactor extractors

Suppose we need to support n different data formats and m different solvers.

Without parser/extractor boundary it is necessary to implement n*m "direct" converter objects.
With parser/extractor boundary only n+m converters (n parsers and m extractors) are needed.

Parser is responsible for transforming a timetable in specific format (e.g. GTFS Static) into a ParsedData object, which is a part of common interface.

Instances of ParsedData can be easily merged using Merger or transformed in some other way.

Extractors are responsible for generating solver-specific data basing on parsed data.

For every approach to solve a problem there should exist:

an extractor (CustomSolverExtractor) to generate pre-calculated data (dataframes, graphs, etc.) from parsed data,
a data structure (CustomSolverData) to contain all pre-calculated data in a single object that is easy to cache, serialize and load on demand,
a solver (CustomSolver) to operate on pre-calculated data.

Note that extractors' outputs for two solvers may have different fields, thus no single ExtractedData structure can be defined as a part of common interface.

Commonly used dataframes, graphs or numeric values, for example:

avg_durations_df
stops_by_name_df
max_trip_duration

should be extracted using supporting functions (or CustomExtractor methods).

Custom extractors should implement extract(self, parsed_data: ParsedData) -> CustomData method which returns an appropriate data object.

Extractors can be configured within __init__ method (if necessary) using keyword-only arguments with default values preferably.

class MagicExtractor:
    def __init__(self, *, magic_coeff: float = 7.0):
        self.magic_coeff: float = magic_coeff

    def extract(self, parsed_data: ParsedData) -> MagicSolverData:
        G = self._extract_magic_graph(parsed_data)
        G2 = self._extract_another_magic_graph(parsed_data)
        ...
        return MagicSolverData(G, G2, ...)

    def _extract_magic_graph(self, parsed_data: ParsedData) -> nx.DiGraph:
        G = nx.DiGraph()
        ...
        return G

All extractors must be state-less since they can be used multiple times.

Example usage:

parsed_data = ...

extractor = MagicExtractor(magic_coeff=7.0)
magic_solver_data = extractor.extract(parsed_data)
solver = MagicSolver(magic_solver_data, allow_black_magic=True)

query = ...
solver.find_connection(query)

wietlabs / krk_meetings

Remove ExtractedData and refactor extractors #82