openmobilityfoundation / mobility-data-specification

A data standard to enable right-of-way regulation and two-way communication between mobility companies and local governments.
https://www.openmobilityfoundation.org/about-mds/
Other
687 stars 232 forks source link

Add an MDS Metrics API #485

Closed whereissean closed 3 years ago

whereissean commented 4 years ago

Is your feature request related to a problem? Please describe.

There is currently no standard way to retrieve metrics calculated from MDS data (provider or agency) or to define a standard set of useful MDS-based data aggregations.  We have heard from the OMF community that this leads to a number of problems:

Describe the solution you'd like

The proposed Metrics API is intended to help users of MDS - both cities, mobility service providers, and third-party ecosystem services - to have a standard way to consistently describe available metrics, and create an extensible interface for querying core MDS metrics and future metrics still to be defined.  It should be a framework to describe how different API users and hosts can:

  1. Define and communicate available metrics;
  2. Request these metrics across multiple dimensions and filters;
  3. Serve these metrics either to external parties or to other MDS API consumers, without requiring the transmission of the underlying raw data;
  4. Ensure that multiple parties can reliably reproduce the same metrics, given the same data

The goal is to be able to define “Metric X” and then ensure that when “X” is calculated by the city, authorized parties, or transportation providers, the result will be identical. For example, while n different methods may exist to calculate the utilization of a vehicle or a fleet for a given time range, the Metrics API is intended to ensure that for given method k, the same result will be produced regardless of who conducts the calculation, and there is a standard interface for authorized users to receive this data without requiring access to underlying raw data.

The Metrics API is intended to be useful for future MDS use cases, best practices and requirements.  Particularly notable is that it provides the foundation to implement data anonymization best practices, such as k-anonymity.  It also represents an important component needed to enable new MDS policy types and compliance evaluation as well as operations management use cases that can only be achieved by linking MDS metrics and MDS policy.

This proposed specification is not intended to represent a complete data pipeline or analytics service. It is also not meant to define the complete set of MDS metrics, only a useful starting point.

Is this a breaking change

Impacted Spec

Describe alternatives you've considered

It is hoped that this work can be complementary to other projects working to define, develop, and implement metrics services or metrics processing pipelines for MDS data.  Much of this proposal was inspired by excellent work done by OMF member cities and SharedStreets with their SharedStreets Mobility Metrics.

This proposal represents work done without full visibility into the efforts of the Mobility Data Collaborative (MDC).  We hope to bring the metrics defined in the Metrics Definitions PR to alignment with those MDC describes, once they become public.

Additional context

This specification received initial input from a variety of OMF contributors, representing city transportation departments (LADOT), ecosystem services stakeholders (Blue Systems, Lacuna, Ellis & Associates), and mobility service providers (Bird).  We hope it encourages discussion and creation in the OMF on this important subject.  A reference implementation of this API is not included at this time, but hopefully will be developed and contributed following additional community feedback.

Specific thanks to @bhandzo and @HenriJ.

Proposal consists of the following PRs Metrics API PR #486 and Metrics Definitions PR #487

Retzoh commented 4 years ago

Charles Noling, please explicit here the use-cases you had in mind for the API during the city-services working group call.

Retzoh commented 4 years ago

@whereissean , it would be great if you could be there during the next city-services working group session on May 14th since we plan to discuss the metrics API in details.

whereissean commented 4 years ago

@Retzoh absolutely will be there. Apologies that I couldn't join today.

schnuerle commented 4 years ago

I wanted to add some examples of what cities are asking for in their reporting requirement from providers. Some of it could be derived from MDS (if we can agree on how to calculate things) and some is outside of MDS (and should be).

Here is Louisville, KY's example from their dockless policy (page 16):

The operator shall provide a monthly report by the end of the first full week of the following month that is in a format acceptable to Metro that includes, but is not be limited to, the following:

  1. [* trips] Total number of rides for the previous month and total miles ridden.
  2. [* status_changes] Total number of vehicles in service for the previous month.
  3. [* trips] Number of rides per vehicle per day.
  4. [* status_changes/trips] Location and performance of all preferred and designated parking areas.
  5. [* status_changes] Number of vehicles removed from service
  6. Operator staffing levels
  7. Customer Service Cases, including complaints registered
  8. Vandalism Incidents
  9. Crash reports (to include injury/fatalities)
  10. If available to the Operator, an aggregated breakdown of customers by gender and age monthly. Gender must be reported as male, female, and non‐binary. Age must be reported using these eight age groups: under 5, 5‐17, 18‐24, 25‐34, 35‐ 44, 45‐54, 55‐64, 65 and over.

Items with a [* api] (I added) can be derived from the MDS feed but the methodology is not always agreed upon. Other items cannot be derived from MDS.

Interesting ones here that could be part of Metrics and are not in MDS are:

  1. Trips with complaints/customer service calls.
  2. Trips with vandalism incidents.
  3. Trips with crash reports to operator.
  4. Trip counts broken down by sex (though categories may need to be aligned with Fenway Institute or Williams Institute recommendations or current best practices).
  5. Trip counts broken down by age brackets. (though I'd recommend 5 year buckets like US Census or 10 year buckets)

It would be good to collect other examples of this from current city policy documents.

thekaveman commented 4 years ago

Here are Santa Monica's examples from the Shared Mobility Device Pilot Program Administrative Regulations pages 14-15 (last updated April 2019):

3.16.2 Reporting

Operators must provide accurate weekly summaries to the City describing customer and staff incidents, injuries, system operation, system use, reported complaints, customer service responses, system maintenance, and education and outreach efforts. Reports will be provided to the City in the format defined by the City.

A monthly dynamic cap report must be submitted to the City on the second business day of each month following the program launch to allow the City to assess and potentially adjust fleet deployment quantities.

...

3.16.3 System Reports

Anonymized data reports to the City are required weekly for the following municipal-level data:

(a) Total users in system by month (b) Trip number by day, week and month (c) Detailed, aggregate trip origin/destination information (d) Trip length and time (e) Hourly fleet utilization with trip origin or destination in Santa Monica and within the Downtown area (f) Hourly device quantities within Santa Monica and within the Downtown area

joshuaandrewjohnson1 commented 4 years ago

The Mobility Data Collaborative recently published their Data Sharing Glossary and Metrics document, referenced above and which OMF reviewed/contributed to, and should be utilized here.

MDCGlossaryMetrics02202004.pdf

sharades commented 4 years ago

DC requires 7 additional monthly reports within 10 days of the end of the month. I've included the overarching concepts below but for specific fields please see the document attached. 2020.2.24 Attatchment C 2020 Dockless Permit Reporting V2 .pdf

These include: -Aggregated user data -Aggregated vehicle data -Summary report -Customer Service report (interactions with customers) -Customer summary report (low-income customer plan ridership) -Staging areas -Unmet needs identifying the first location that a user opened the application when searching for a vehicle and did not unlock a vehicle by census block.

Retzoh commented 4 years ago

Subjects discussed during 2020-05-14 city-services working group call:

Presentation by @whereissean: https://docs.google.com/presentation/d/1bg36oyQhZlBCQb07JCUFyVe97WsCAeRCDeXjaFMXYM8/edit?usp=sharing

Presentations of reports by @dirkdk (Spin): https://docs.google.com/document/d/1qZvmJzoWrnOVZeaubqxOLNVYKzueWQEaU3C7H3kQ1dw/edit?pli=1#

Short-term actions:

schnuerle commented 4 years ago

For discussion around how to request the time period.

Should it be interval counts w/ a start date and no end date, or start and end date with interval length only? The first is more machine readable and the latter is more human readable and something a data analyst would use.

I think start and end dates are more consistent with MDS and other kinds of APIs where you request data over a time range. And you can specify the interval (minute, hour, day, week, month) over that time range and those values would be returned in an array.

johnclary commented 4 years ago

@whereissean would you mind making your slide deck publicly viewable?

schnuerle commented 4 years ago

Some notes from our WG call yesterday:

See also new issue #569 from folks at Spin to cross reference use cases.

whereissean commented 4 years ago

@whereissean would you mind making your slide deck publicly viewable? @johnclary

Sorry, appears that permissions were changed on the document. Until I can resolve, here is a new public version: https://docs.google.com/presentation/d/1rVwGSYb4d8myGSN9VJrDl1AOGtmdbqvAbXL8-a5VA-o/edit?usp=sharing

johnclary commented 4 years ago

thanks @whereissean. @schnuerle would you mind adding the Privacy label to this?

johnclary commented 4 years ago

r.e. this bit from the doc:

For the fields that involve special_users, we propose an x number of subcategories like low_income, student or unbanked

😵 this is the first time I'm seeing mention of this in MDS. is this information that is held by providers?

update: will move this discussion to #569

johnclary commented 4 years ago

It looks like this spec might support operational use cases in a way that would avoid the need for agencies and providers to exchange telemetry data. I.e, it might be a drop-in replacement for /status_changes or /trips.

For example, as an agency, I'd like to query for the number of vehicles in service in x geography during the last hour.

Are there limitations that would prohibit such a use case as the spec is currently proposed? The use case above requires fairly high temporal and spatial resolution, and minimal latency.

whereissean commented 4 years ago

I've reviewed the MDC Glossary and the good news is that methodology looks consistent with the proposed MDS metrics. There were a couple of metrics that were not proposed and a number that are not in MDC Glossary. I've added the ones (maximum/minimum average) that were not in the MDS dockless metrics. I also renamed a number of the proposed metrics to try to align.

I also attached a proposed metrics methodology document that discusses how to compute the metrics and compatibility with MDC Glossary definitions. Thanks @joanathan for putting this document together.

@joshuaandrewjohnson1 @jfh01 @schnuerle Please have a look in #487.

schnuerle commented 4 years ago

We reviewed this issue as part of the second OMF Working Group Steering Committee release Checkpoint. Both WGSCs had some feedback and I'm documenting it here for discussion.

1) Is this statement true to what you are proposing? The entire proposed Metrics API is meant to be published by cities to providers, after cities have ingested MDS data from providers. So the city is doing the data processing.

If so, how much value is this to cities, and will they be able to justify the heavy lift implementing an API for this? Why not just pull CSV reports from a city database and share those with providers like they do now? Does an API provide enough benefit?

If not, can you clarify how a city can use it and how a provider can use it, both in the issue description and the PR details?

2) One use case mentioned in the original description is to that a city could make this endpoint public. It does not seem that making this endpoint public is a good idea, and instead data derived from the API could be published by the city and made public.

3) Maybe just creating a defined methodology that cities (and providers) can use to calculate reports from MDS is enough, vs creating an endpoint?

These questions could be explored with a city survey to gauge interest if needed.

johnclary commented 4 years ago

Ah, I completely missed that was being proposed as a city endpoint. As such, it cannot serve as an alternative to consuming raw trip data, and in fact this proposal necessitates adding more attributes to trip records.

That answers my own question.

marie-x commented 4 years ago

@johnclary @schnuerle @jfh01 I think there's a misunderstanding here.

The Metrics API is not just for Agencies; it could be implemented by Providers. And the consumers of an Agency implementation of metrics are not necessarily (only) Providers, in fact the main use cases are for city-internal consumption by analytics and visualization tools.

Yes, this could be an alternative to consuming raw trip data, although in the absence of such data, it makes the metrics essentially impossible to verify.

dirkdk commented 4 years ago

@johnclary @schnuerle @jfh01 I think there's a misunderstanding here.

The Metrics API is not just for Agencies; it could be implemented by Providers. And the consumers of an Agency implementation of metrics are not necessarily (only) Providers, in fact the main use cases are for city-internal consumption by analytics and visualization tools.

Yes, this could be an alternative to consuming raw trip data, although in the absence of such data, it makes the metrics essentially impossible to verify.

That is how I saw the Metrics API as well. It is a standard that can be implemented by Agency or Provider, or even 3rd party Data aggregator. Either with input data from other MDS endpoints, or different sources (like Special Groups data that would only be available to the Provider)

schnuerle commented 4 years ago

Note that for 1.1.0 we have merged with #582 the new Geography API to the 'dev' branch. Please update this pull request with the latest code, resolve any conflicts, and make references to the Geography API where appropriate, e.g. with UUIDs.

We will be discussing Metrics at this week's Working Group meeting, so if available please come prepared to talk about your latest updates and ideas.

schnuerle commented 4 years ago

The content of the 2 Metrics pull requests #486 and #487 have been merged to the new [feature-metrics](https://github.com/openmobilityfoundation/mobility-data-specification/tree/feature-metrics/metrics) feature branch for everyone to review in context with MDS and the new Geography API, and make PRs against.

We will leave this issue open until that branch is ready to be merged to dev so please continue to leave feedback/ideas here, or on the new feature branch PR #587.