ministryofjustice / find-moj-data

Find MOJ data service • This repository is defined and managed in Terraform
https://find-moj-data.service.justice.gov.uk/
MIT License
5 stars 0 forks source link

Ingest CJS dashboard metrics #834

Closed jemnery closed 3 weeks ago

jemnery commented 2 months ago

https://criminal-justice-delivery-data-dashboards.justice.gov.uk/all-metrics

After speaking with Laura and Edwin, there's no API. The only way to ingest their metrics would be to read the Python / JSON files from their repo.

E.g. https://github.com/ministryofjustice/cjs-dashboard/blob/develop/cjs_test_app/content/improving_timeliness_police.py

https://criminal-justice-delivery-data-dashboards.justice.gov.uk/improving-timeliness/police#time_to_success

"key": "improving_timeliness_police",
    "heading": "Crime recorded to police decision",
    "intro_text": "The charts on this page show the time it takes from a crime being officially recorded by the police, to the police recording an outcome. [Find out more about this stage of the justice system](../about). You can choose to see the charts for all crime data or for only adult rape cases. Charts display national data by default. Use the page data options to view data for local criminal justice boards.",
    "offence_types": ["All crime", "Adult rape"],
    "graphs": [
        {
            "key": "time_to_success",
            "heading": "Average days taken for police to record a successful outcome",
            "intro_text": "This chart shows the average (median) number of days it takes for the police to record a formal or informal outcome, such as a caution, warning or charge.",
            "axis_label": "Median days",
            "national_comparison": True,
            "extra_options": ["victimbased"],
            "offence_types": ["All crime", "Adult rape"],
            "metrics": [
                {
                    "metric_name_ref": {
                        "option_key": "offence_type",
                        "adult_rape": "police_days_to_successful_outcome",
                        "all_crime": {
                            "option_key": "victimbased",
                            "victim": "police_days_to_successful_outcome_victim_based",
                            "state": "police_days_to_successful_outcome_state_based",
                        },
                    }
                }
            ],
        },

Note improving_timeliness_police becomes /improving_timeliness/police in the URL

They have no objections to us doing this.

They do have data owner details for each dataset - perhaps we could ask they add them to the above files? Or we add them in a PR?

jemnery commented 4 weeks ago

This can be treated like a spike - if it's not currently possible without changes to their codebase, let's document what's needed, discuss with the CJS dashboard team and raise a new ticket for a later sprint.

It's OK if it's only partially possible - e.g. we can catalogue the metrics but not easily reconstruct the URLs to those metrics, like we have with Justice Data. The dashboard container description can have a link to the home page

jemnery commented 4 weeks ago

More on constructing URLs to the app.

The general rule seems to be

Some page have multiple metrics, for example rape review )(repo link)

Metrics don't seem to have dedicated pages but the metric keys can be used as anchors to IDs. So under rape_review the metric key rape_review_receipts can be used to build root-domain/page_key#metric_key

MatMoore commented 3 weeks ago

I've committed a proof of concept script here https://github.com/ministryofjustice/data-catalogue-metadata/pull/32

This approach is very tightly coupled to the CJS dashboard code and is likely to break if anything changes, so I definitely do not recommend running it outside of dev.

A better way would be to ask the team to extract the metadata to a standardised format, rather than defining it directly in their python code. We could then extend this solution to any chart/dashboard/dataset that is defined in a public repo.

This is what it looks like now in dev: https://dev.find-moj-data.service.justice.gov.uk/search?query=criminal+justice+dashboard&domain=&entity_types=DASHBOARD

This has the following issues:

I think it's probably worth having a team discussion about the "crawler" ingestion strategy to make sure we're in agreement before we start asking other teams to do things. We already have a ticket for this here: https://github.com/ministryofjustice/data-catalogue-metadata/issues/14

I'm slightly wary about doing this too early, just because the subject area taxonomy is not stable yet. If we're asking people to spend time filling out missing data, it will be very annoying if we then ask them to redo it because we changed how we categorise things.

jemnery commented 3 weeks ago

All good points, but despite challenges it looks pretty good and does make these charts discoverable.

Re subject area, anything with "court" in the title can be assigned to the court domain, and everything else "general"? We're putting non-MOJ metrics from Justice Data into "general"

Contact info - we could do one of or both of these?

MatMoore commented 3 weeks ago

@jemnery do you want me to roll this out to pre-prod then? I can add in the missing subject area / contact info bits.

MatMoore commented 3 weeks ago

Regarding next steps there are two options really:

Option 1 is to design a general purpose process for ingesting metadata from github (outlined by @tom-webber in https://github.com/ministryofjustice/data-catalogue-metadata/issues/14) - then make a pull request to their repo to either push the metadata or define it in a format that can be pulled.

Option 2 is to continue with a bespoke solution, making this the 3rd or 4th custom integration. In this case I would suggest collaborating with the CJS dashboard team to make the metadata more easily scrapeable, i.e.

Option 1 would be reusable and easier to support as we scale up to multiple data sources in production, and I think it will be easier to hand off responsibility for maintaining the metadata.

Option 2 is quicker/less work in the short term but more code to maintain long term.

jemnery commented 3 weeks ago

Yes, let's roll this out to preprod with those additions.

Those options are sensible, but as I think you pointed out we'd want a more holistic view of what our metadata format is.

Let's see if the CJS dashboard team are happy with us cataloging their service before we do anything else.