guardian / typerighter

Even if you’re the right typer, couldn’t hurt to use Typerighter!
Apache License 2.0
276 stars 12 forks source link

Introduce`/api/rules/csv-import` endpoint #477

Closed simonbyford closed 3 months ago

simonbyford commented 3 months ago

What does this change?

Introduces a new endpoint /api/rules/csv-import which enables the bulk import of regex rules via a CSV file. In particular, this will provide a useful method to quickly populate typerighter with the names of incoming MPs after an election.

The endpoint accepts three parameters encoded as form-data:

For example:

curl --location 'https://manager.typerighter.gutools.co.uk/api/rules/csv-import' \
--header 'accept: */*' \
--header 'content-type: application/json' \
--form 'file=@"/Users/Simon_Byford/Downloads/mps.csv"' \
--form 'tag="MP"'
--form 'category="Style guide and names"'

Note: this won't actually work for reasons I'll get to..

The CSV file should not contain headers, and the columns must appear in the following order: pattern,replacement,description

For example, the following CSV file is valid:

Diann?e Abbott?,Diane Abbott,"MP last elected in 2024: Labour, Hackney North and Stoke Newington"
Debbie Abrahams,Debbie Abrahams,"MP last elected in 2024: Labour, Oldham East and Saddleworth"
Rebecca Long-? ?Bailey,Rebecca Long-Bailey,"MP last elected in 2024: Labour, Salford"

If the operation is successful, the API will return the number of rules added.

How to test

For the first iteration, we decided not to expose this in the UI, instead the endpoint must be queried manually. This is a bit fiddly because it requires a cookie for authentication. The best way to get around this is to interact with the manager interface (manager.typerighter.gutools.co.uk) - for example, start creating a rule - and inspect any network request to /api/rules. You can then copy the request as cURL:

Screenshot 2024-07-23 at 16 11 56

You can then either make the necessary edits and run it directly on the command line (scary), or import it into a client like Postman:

Screenshot 2024-07-23 at 16 30 28

Screenshot 2024-07-24 at 14 28 14

Images

Using the above method, I tested this on CODE with Max's spreadsheet of 2024 MPs. I used the tag "CSV import" and category "Imported from CSV".

Screenshot 2024-07-23 at 14 52 01

jonathonherbert commented 3 months ago

One other thing to mention while I'm here – there's a good testing story for DB operations in RuleManagerSpec that should let us test CSV data is written correctly to DB. Could we add a test? (See CAPIFixtures for an example of reading a file from our resources folder, we could add a proper CSV file to our test/resources folder, and be really sure this is doing the right thing ✨)

Very happy to pair!

simonbyford commented 3 months ago

One other thing to mention while I'm here – there's a good testing story for DB operations in RuleManagerSpec that should let us test CSV data is written correctly to DB. Could we add a test? (See CAPIFixtures for an example of reading a file from our resources folder, we could add a proper CSV file to our test/resources folder, and be really sure this is doing the right thing ✨)

Very happy to pair!

Hey @jonathonherbert, I've added a test (with your help 😄), does it look okay?

simonbyford commented 3 months ago

One broader point – it looks like we can enter identical rules, e.g. spamming the endpoint with repeated requests results in duplicates, which might be an easy mistake to make. We could make this safer with a smoke test on import that checks to see that no rules exactly match the description or pattern of a rule. Perhaps one to follow up with.

Good point, thank you for raising it. Some further work is needed before this can actually be used, namely the introduction of another endpoint to bulk-archive existing rules (based on a tag), so I'll leave this for now

prout-bot commented 3 months ago

Seen on Rule Manager (merged by @simonbyford 9 minutes and 30 seconds ago) Please check your changes!

prout-bot commented 3 months ago

Overdue on Checker (merged by @simonbyford 15 minutes and 3 seconds ago) What's gone wrong?

jonathonherbert commented 3 months ago

This is expected until https://github.com/guardian/typerighter/pull/468 is merged – any reviews v. welcome 🙏

On Wed, 31 Jul 2024 at 11:13, Prout @.***> wrote:

Overdue on Checker https://checker.typerighter.gutools.co.uk/healthcheck (merged by @simonbyford https://github.com/simonbyford 15 minutes and 3 seconds ago) What's gone wrong?

— Reply to this email directly, view it on GitHub https://github.com/guardian/typerighter/pull/477#issuecomment-2260161323, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3IMFYVXDFNZQ77QBNJJX3ZPC2C7AVCNFSM6AAAAABLKWL34GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRQGE3DCMZSGM . You are receiving this because you were mentioned.Message ID: @.***>

--

Jonathon Herbert · he/him

Senior Developer, Guardian News and Media

@. @.>


Kings Place, 90 York Way,

London N1 9GU

theguardian.com


Download the Guardian app for Android https://play.google.com/store/apps/details?id=com.guardian&hl=en_GB and iOS​ https://itunes.apple.com/gb/app/the-guardian/id409128287?mt=8

--

This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way.  Guardian News & Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software.   Guardian News & Media Limited is a member of Guardian Media Group plc. Registered Office: PO Box 68164, Kings Place, 90 York Way, London, N1P 2AP.  Registered in England Number 908396