Friendly UI for POD issuance via CSV pipeline

robknight commented 3 weeks ago

[x] Handle row deletion in CSV
[x] Add row button

artwyman commented 3 weeks ago

Am I interpreting correctly that this PR now supplants #1813? The POD-issuing code here looks good to me. Just a few suggestions:

The sample data should maybe contain an example of setting an arbitrary extra field (not one of the well-known zupass fields) to show how that can be done.
pod_type in particular would also be good to include in the example, though it's optional
Supporting non-string values would be good to think about in advance, at least to ensure it can be added in a way which won't break compatibility. Treating the values as JSON might be a good start for distinguishing numbers from strings, but full expressibility in all cases is going to need some explicit type info. Using the full PODValue format "{ type: t, value: v}" would allow for that unambigously, but it's annoying to users. We have the start of that in the "simplified JSON" which accepts strings of the form "type:value", so you could use that for now. I plan to change that to something more flexible and user-friendly in future based on JSON objects (maybe "{type: stringifiedValue}"). To ensure compatibility of that you could disallow strings which parse as JSON objects.
I don't see any tests of the new POD-issuing code. Seems like a good idea to fill those in before merging.

robknight commented 3 weeks ago

pod_type in particular would also be good to include in the example, though it's optional

What should it be set to?

Supporting non-string values would be good to think about in advance

I have thought about it! Entries containing Semaphore IDs are cryptographic, for instance. For values coming from the CSV data, I would need to do some kind of conversion, with invalid data being flagged as an error. There are two main ways I could imagine doing this and I'm still feeling my way towards an intuition on which is best:

Specify up-front which columns are expected in the CSV input, and what their types are. Where a column is mapped to a POD entry, it must be of a compatible type.
Allow any CSV column to be mapped to any POD entry, and do pipeline.load()-time checking that each value in the CSV file converts cleanly to the POD entry type. Number -> string conversions will always succeed, but string -> int or string -> cryptographic might fail (e.g. if (BigInt(inputValue) throws an exception).

We might be able to move the second check away from load() time, and verify during editing that the CSV data is appropriate for the POD entries configured. There's a further question about whether we can fully check everything in the UI, or whether we check it when saving the pipeline definition.

I don't see any tests of the new POD-issuing code. Seems like a good idea to fill those in before merging.

We don't have any tests for the CSV pipeline at all right now, beyond the basic tests that the pipeline class can be instantiated. I'm leaning towards breaking the generic POD issuance code out into a new pipeline that can be more conservatively maintained than the CSV pipeline.

artwyman commented 3 weeks ago

What should it be set to?

Should be user-configurable. In theory on every row separately, though maybe there's a usability benefit to having it be a pipeline configuration. That begs the question of what the default value would be. Maybe something generic like "zupass_generic_pod"? Honestly it could be left off by default if it's not giving any info, but I want to encourage users to think about types to filter and distinguish different PODs. At minimum, I think PODTickets should have a different types from generic PODs coming from this pipeline. (They have no type now, I think they should have "zupass_ticket" or somesuch.)

I have thought about it! Entries containing Semaphore IDs are cryptographic, for instance. For values coming from the CSV data, I would need to do some kind of conversion, with invalid data being flagged as an error. There are two main ways I could imagine doing this and I'm still feeling my way towards an intuition on which is best:

Interesting. The core distinction seems to be whether you want to require (or at least encourage) each row to follow the same "schema", or allow each row to contain arbitrary values of any type. PODs can support the latter, but it seems like the former is probably the most common use case. I might lean toward the former for the purpose of avoiding footguns. In that case specifying types as part of the pipeline configuration sidesteps the need to encode it in the value.

That being said, I think I'd appreciate your thoughts as a sounding-board for different ways of encoding POD values in JSON and/or strings. I'll see if we can grab some time to discuss in the office.

We don't have any tests for the CSV pipeline at all right now, beyond the basic tests that the pipeline class can be instantiated. I'm leaning towards breaking the generic POD issuance code out into a new pipeline that can be more conservatively maintained than the CSV pipeline.

Separating them out sounds like it may be a good idea, given different featuresets and different sensitivities to risk. I also am all for adding tests to the new code even if the old code doesn't have it.

robknight commented 3 weeks ago

I'm struggling to think of how I would explain the pod_type field to someone with no background understanding. zupass_display is easy because it's grounded in the Zupass pod_pcd_ui package implementation, but pod_type feels quite a bit more nebulous.

For instance, should pod_types be unique for a given set of entry types (a "schema")? If I have a zupass_ticket and add another entry, should the pod_type change? What if I remove an entry? Is a POD with pod_type of zupass_ticket still a ticket if it has none of the entries one would expect, and if it isn't then what role is the pod_type entry playing here? If it's a kind of type hint, who is this hint for and what assumptions are they entitled to make based on the hint?

artwyman commented 3 weeks ago

It's absolutely nebulous, because I think we're going to work out what it means by using it in various use cases. Nothing is enforced yet, so the detailed answers to some of your questions will also emerge as best-practice over time. What I'm pushing for a this stage is we should start putting something in that field so we can start exploring what to do with it.

My current thinking on what a pod_type should do:

Express the issuer's intent about what this POD should be used for. (Signing key is an implicit categorization, but multiple issuers might want to interact with common PODs.)
Set expectations about the entries a consumer will expect to find in the POD. (This is tied to intent, but gets into the realm of a schema, though not an enforced one.)

Neither one is strictly enforced, and there will be open questions (like what to do when expectations change over time), but it can still be useful. When populating the dropdown for a ticket proof you might want to display only PODs with pod_type=zupass_ticket. When creating a POD app I want to have a way to easily consume only the PODs which belong in my app. My bank-balance POD and my sci-fi character-sheet POD might both have an entry named credits but with very different meanings which probably shouldn't be mixed together.

I was assuming a rough sort of namespacing with pod and zupass prefixes. If we wanted to be really careful about uniqueness we could use a format like org.zupass.ticket, which is a technique used to make unique URIs elsewhere. I've been intentionally veering away from making things too formalized in the framework because I want devs to know they can keep things light.

robknight commented 2 weeks ago

This might be a call for @rrrliu, but my understanding is that we have to trade off getting the long-term design right, and the short-term need to get user feedback on a POD issuance UI from testers including the ops team.

I've created a design document on Notion where we can discuss the best practices that we want to nudge uses toward, as well as the long-term aim for maintenance and testing of self-serve issuance features. I agree that this needs proper thought, but this PR is probably not the place to make those decisions.

robknight commented 1 week ago

Closing this in preference of a longer-term solution.

proofcarryingdata / zupass

Friendly UI for POD issuance via CSV pipeline #1819