lmmx / tubeulator

TfL open data interface library
https://tubeulator.vercel.app
MIT License
2 stars 0 forks source link

Reproducible download of stationdata #31

Open lmmx opened 3 months ago

lmmx commented 3 months ago

I downloaded the zips from the endpoint (which tubeulator should likewise be able to do… but actually doesn’t, presumably because the names for StopPoint are hardcoded and stationdata is discarded?)

This could then be extended to do things like make a graph of the network

lmmx commented 3 months ago

So previously we only cared about the components/schemas path in the schema, whereas here the schema doesn't have any, it has paths which indicate where to go pull stationdata datasets from

(tubeulator) louis 🚶 ~/dev/tubeulator $ tubeulator populate
> /home/louis/dev/tubeulator/src/tubeulator/utils/paths.py(60)load_endpoint_component_schemas()
-> endpoint_schema = load_endpoint_schema(schema_name)
(Pdb) n
> /home/louis/dev/tubeulator/src/tubeulator/utils/paths.py(61)load_endpoint_component_schemas()
-> component_schemas = endpoint_schema["components"].get("schemas", {})
(Pdb) n
> /home/louis/dev/tubeulator/src/tubeulator/utils/paths.py(62)load_endpoint_component_schemas()
-> return component_schemas
(Pdb) pp endpoint_schema
{'components': {'securitySchemes': {'apiKeyHeader': {'in': 'header',
                                                     'name': 'app_key',
                                                     'type': 'apiKey'},
                                    'apiKeyQuery': {'in': 'query',
                                                    'name': 'app_key',
                                                    'type': 'apiKey'}}},
 'info': {'description': '', 'title': 'Station Data', 'version': '1.0'},
 'openapi': '3.0.1',
 'paths': {'/tfl-stationdata-detailed.zip': {'get': {'description': 'TfL '
                                                                    'station '
                                                                    'data '
                                                                    'detailed',
                                                     'operationId': 'detailed',
                                                     'responses': {'200': {'description': ''}},
                                                     'summary': 'TfL station '
                                                                'data '
                                                                'detailed'}},
           '/tfl-stationdata-gtfs.zip': {'get': {'description': 'TfL station '
                                                                'data gtfs '
                                                                'files',
                                                 'operationId': 'gfts',
                                                 'responses': {'200': {'description': ''}},
                                                 'summary': 'TfL station data '
                                                            'gtfs'}}},
 'security': [{'apiKeyHeader': []}, {'apiKeyQuery': []}],
 'servers': [{'url': 'https://api.tfl.gov.uk/stationdata'}]}

It would make sense to populate these into Pydantic models now and work less chaotically.

The code is full of crap like this:

@cache
def load_endpoint_schema(schema_name: str):
    """Load an entire JSON schema for an API endpoint by its name, e.g. "Line" or "Mode"."""
    endpoint_schema = json.loads(Path(find_schema_by_name(schema_name)).read_text())
    return endpoint_schema

@cache
def load_endpoint_component_schemas(schema_name: str) -> dict[str, dict]:
    """Load all component schemas of a JSON schema for an API endpoint by endpoint name."""
    if schema_name == "stationdata":
        breakpoint()
    endpoint_schema = load_endpoint_schema(schema_name)
    component_schemas = endpoint_schema["components"].get("schemas", {})
    return component_schemas

Note that the cache here is standing in for the "parse once" principle (when working with Pydantic)