RydalWater commented 1 month ago

About:

The ability to perform a review of products, services and/or other consumables is a critical way by which companies and product developers receive feedback informing them of their success and or failure. While social signals (e.g., follows, likes, comments etc.) do provide some real-time feedback these are inherently ephemeral in nature and do not always reflect the formal experience of an individual with regards to a given product.

Existing product review process focus on the aggregation of data in the hopes that bad actors will not have an outsized impact on the collective review scoring for a product. They relay on metrics such as total number of reviews, average ratings and ratios (good vs. bad). Nostr provides a unique method of leveraging social graphs to be able to surface spheres of influence relevant to users which allows for a fine tuned user experience when judging the appropriateness of a product for their needs.

While it is true with the current Nostr tools we can already achieve some targeted opinions by surfacing social signals, this isn't perfect as it doesn't express an explicit quantifiable opinion across users. One user's like maybe another's love, instead we should really be using more formal grading scales to firstly explicitly state the reviewer's opinion (in a way that can be aggregated easily), and we should also ideally have a method by which opinions can be changed or replaced over time. Maybe a company or product starts off good but then takes a nose dive, users of these products should reserve the right to change their opinion to reflect their current experience.

Proposal:

This proposal leverages a bunch of existing building blocks including a range of tags as specified in other NIPs along with the introduction of two new kinds 2020 for reviews themselves and 32020 for parameterized replaceable events which would represent user/client defined review sets.

Structure:

A new review event object (type, identifying tag (informed from type), a denominator tag, a numerator tag)
New tag objects:
- type: helps categorize the content and provides pointer for clients of what identifying tag will follow see below
- denom: the denominator for the rating
- rating: the rating value (numerator)
- unit: optionally used tag to allow clients to determine the rating unit
A new parameterized replaceable list (32020 - Reviews Set) of reviews where e tags refer back to valid/current review events (maintainable like a follow/mute lists).

Example Details:

// Details of Kind twenty-twenty event
{
    "kind" : 2020,
    "tags" : [
        ["type", "<type integer> (e.g., '4' for 'hospitality')"],

        // Example review identifiers (just **ONE** of the below items is provided informed by type tag)
            ["location", "<name and address of location> (e.g., Joe's Diner, 123 Main St., Anytown, USA)"],
            ["web", "<url of webpage>"],
            ["relay", "<url of image>"],
            ["e", "<nostr event id (hex)>", "<other parameters>"],
            ["p", "<nostr pubkey (hex)>", "<other parameters>"],
            ["i", "<External Content IDs>", "<other parameters>"]
        // End of Example review identifiers

        ["denom", "<integer or floating point number (max 999999.99) denoting the maximum value> (e.g., 5)"],
        ["rating", "<integer or floating point number (max 999999.99) denoting the user value relative to the denom> (e.g., 4.5)"],
        ["unit", "<optional unit of measurement> (e.g., stars)"]
    ],
    "content" : "<optional comment or context to go with the review (e.g., Great place to eat!!)>"    
}

// Details of Kind 3-twenty-twenty a parameterized replaceable list of review events.
{
    "kind": 32020,
    "tags": [
      ["d", "FoodIsGood123"],
      ["title", "Food Reviews"],
      ["description", "All of my favorite places to eat, or never again in some cases!!"],
      ["e", "d78ba0d5dce22bfff9db0a9e996c9ef27e2c91051de0c4e1da340e0326b4941a"],
      ["e", "d78ba0d5dce22bfff9db0a9e996c9ef27e2c91051de0c4e1da340e0326b4941b"],
      ["e", "d78ba0d5dce22bfff9db0a9e996c9ef27e2c91051de0c4e1da340e0326b4941c"],
      ["e", "d78ba0d5dce22bfff9db0a9e996c9ef27e2c91051de0c4e1da340e0326b4941d"],
    ],
    "content": "",
}

Type-Tag Mapping:

This table describes a range of type values which would inform the valid/expected tag helping the client to identifiy the item under review.

Number	Type	Expected Identifying Tag(s)
0	nostr post	`e` or, `a`
1	nostr user	`p`
2	entertainment	`i`
3	relay	`relay`
4	hospitality	`location` or, `g`
5	website	`web`

Possible future tag ideas:

{
    "tags": [
        ["product", "<name of product>", "<other parameters>"],
        ["service", "<name of service>", "<other parameters>"],
        ["organization", "<name of organization>", "<other parameters>"],
        ["event", "<name of event>", "<other parameters>"]
    ]
}

Comments:

The proposal allows individual review events to exist and be created on the fly and the user to maintain their own list of valid events, thereby giving them the ability to explicitly start that this review still reflects their current opinion on the subject. The downside of this method is that clients will need to go retrieve all valid reviews and parse them in order to determine which may be relevant to display.

Comments/feedback etc. very welcome. I've probably not done things the most 'nostr' way so very open to suggestions here.

The suggested new tags would also be useful outside of this specific case, for example unit would be handing for step tracking and other health app clients.

AsaiToshiya commented 1 month ago

RydalWater commented 1 month ago

Is this similar to #879?

Yeah definitely some substantial overlap. I was thinking about the problem more from the client side trying to get it to be easy to find and maintain reviews about given topics and how to leverage existing tag to provide a variety of data types when providing the review (so they can be attached to those objects more directly).

The proposal for NIP-85 looks like it wants to give more graduated structural reviews like (customer service, cleanliness etc.) but this very general model makes it difficult to standardize a review across many clients as the number of rating values may not be consistent with simple overall rating review (as I've proposed here). I am not in any way saying one method is better though because a review with a breakdown of ratings can also be extremely valuable.

@staab realizing there is crossover I'd like to get your thoughts on this proposal. I know you're goal with the solution you provided was very simple but I wonder if it is maybe too simple for some cases. Or perhaps that both proposals have value?

staab commented 1 month ago

879 needs to be updated in some important ways, but it would be great if we could settle on reviews that are backwards-compatible with that PR. I do think that "less is more" with NIPs, and there is a lot of unnecessary divergence here:

type is the same as the l tag, except it's numeric which makes it harder to read and the NIP harder to maintain. Nested namespaces are only useful if the sub-namespace is user-generated. So we should either add a new top-level kind for each review type, or use arbitrary l tags.
This NIP has some extra tags, which are useful for sure. There's no reason #879 couldn't support other tags. These tags are likely to be type-specific though, which probably means different kinds per review type is the way to go.
unit and denom unnecessarily partition review data sets. In other words, how do you average a rating of 3/5 stars and 2/10 chef's-kiss-hands? You would normalize them to percents, but then some semantics have been lost. It's probably best to just normalize to a decimal/percent like #879 does, and clients can show whatever granularity or emojis they want to.
Kind 32020 would fit neatly into NIP 51, I have no problem with these events.

staab commented 1 month ago

Just updated #879 to what my preference would be for reviews. I don't mention the list or additional tags per review type, but those could be added.

Also, you can ignore my comment about backwards compatibility, I don't think it's actually important. There are kind 1985 and 1986 reviews with different formats, both published by coracle, and a few kind 1986 published under a qts namespace, looks like maybe for a freelancing product. I only found 86 events of either kind, so we should just pick a new kind and move on.

RydalWater commented 1 month ago

I see now, yes you're right I was trying to use the type tag as a way to provide a pointer for the client as to what the subsequent tag will look like (basically though as you say that is a replication of the kind behavior and is likely not the cleanest way to handle it). The general idea here was to attach the review to real things (not just nostr events), as in your example we want to be able to connect a review to a company, product, URL, podcast, movie, venue etc.. In your proposal would we just put all of these related items into a an i tag?

For general reviews I think there are a few problem statements we need to resolve:

I need a way to attach explicit reviews to specific digital or real-world objects which can be used along with WoT to surface relevant information to users
I must have a way to change or replace a reviews in case my opinion of an object changes
I should have a way to normalize the rating data across multiple platforms therefore the (maximum value for a rating scale SHOULD/COULD be provided to facilitate this).

Your solution I think solves the first, though I think it could still be extended to cover the others without much of an issue. The list part of this proposal is independent but helps solve the second problem which is extremely valuable as it can help clients avoid presenting outdated reviews without needing to go looking for this information (it puts the user in control of what they are putting their name next to). For problem 3 there is almost no reason to not extend the rating kind to include the denominator since we'd be adding a tag anyway and the additional tag can be optional.

I've added a comment/suggested up to #879 and I'll work on a PR for the list proposal, changing the kind to 31987 for clear linkage with your rating proposed kind.

MerryOscar commented 1 month ago

I think having a review kind is a great addition to nostr. I like the simpler version from @staab in https://github.com/nostr-protocol/nips/pull/879 but also think a list of reviews is a great additon.

staab commented 1 month ago

In your proposal would we just put all of these related items into a an i tag?

Yep, i is the right way to handle things external to nostr. I think for reviews it would be worthwhile to use i even to review things within nostr, e.g. e:<event-id> or p:<pubkey> to avoid collisions on regular mentions/parents/quotes.

I must have a way to change or replace a reviews in case my opinion of an object changes

We could use replaceables instead, why not. The d tag could then be the target of the review, clearing up my previous comment.

I should have a way to normalize the rating data across multiple platforms therefore the (maximum value for a rating scale SHOULD/COULD be provided to facilitate this).

Using a value from zero to one would do this, with unlimited granularity, right?

I've gone and updated #879 to include a NIP 51 list, and use replaceable events.

RydalWater commented 1 month ago

Great, I think it is coming together nicely.

One thought is, why restrict kind 31987 to relays specifically? Shouldn't we just go with the following structure:

A general kind which allows for create of replaceable review events
A kind which allows for the combination and categorization of review events (set) - Your proposal for this looks good though will go through it again to be sure.

Review event structure:

identifier tag d (can be anything effectively, used for replacing/coordination)
external content tag i (object under review)
ratings tags rating 1-n
denominator tag denominator *

*I realise I keep coming back to this but it is valuable I think to keep away from the protocol requiring that all clients conform to a 0-100 scale for ratings. I agree you can get back to 0-100 from rating/denom but you shouldn't have to. You should be able to capture the value as it is entered not as the protocol wants to see it. It makes the protocol too opinionated I think about what a rating looks like.

staab commented 1 month ago

One thought is, why restrict kind 31987 to relays specifically?

Additional tags might make sense in some circumstances and not others. We could potentially use a single kind, and show a different UI based on d tag. But in other NIPs (like #1043) it has seemed to make more sense to use a different kind for each different thing.

You should be able to capture the value as it is entered not as the protocol wants to see it.

I'm unconvinced, but it would be fine to add a denominator (or maybe scale) tag to your client if you prefer. If enough people adopt it, we can add it to the NIP. I understand what you're going for, I just think it would break interoperability if clients had to use the original denominator when displaying reviews. In reality, clients will just normalize all reviews, which is harder when using different denominators.

Maybe a different way to solve this would be something like a user_rating tag, which would contain the rating "text", for example some emojis or the text "3/5" or something. That way clients can normalize, but always show the original intent.

RydalWater commented 1 month ago

Additional tags might make sense in some circumstances and not others. We could potentially use a single kind, and show a different UI based on d tag. But in other NIPs (like #1043) it has seemed to make more sense to use a different kind for each different thing.

Fair enough, it just feels redundant to start with multiple kinds when a general kind may give us a broad use case and then if specialized use cases emerge those can then deviate from the general to create bespoke needs. I am not against specific kinds, I just think it is more work to create new kinds for each unique use case, right?

I'm unconvinced, but it would be fine to add a denominator (or maybe scale) tag to your client if you prefer. If enough people adopt it, we can add it to the NIP. I understand what you're going for, I just think it would break interoperability if clients had to use the original denominator when displaying reviews. In reality, clients will just normalize all reviews, which is harder when using different denominators.

Maybe a different way to solve this would be something like a user_rating tag, which would contain the rating "text", for example some emojis or the text "3/5" or something. That way clients can normalize, but always show the original intent.

Happy to concede this ground, I am not so tied to it. I do like your suggestion of adding an optional general text field which could be used to convey ratings verbatim. Perhaps raw_rating instead of user. A client could look for it or ignore it as they see fit.

Finally, I was thinking more broad and general about this problem and if even "ratings" is too specific. I wondered if perhaps this could be more simply defined as a "Score Kind". That is to say the kind is used to convey a score for something. Ratings, for example are a type of score. But then for questionaries or other similar things when you ask a user to grade something you wouldn't call them ratings per-se.

I realize this last thought really looks back at the concept of ratings and leans into my suggestion above of a single general purpose kind rather than specific kinds for specific use cases. What do you think?

Example:

{
    "kind": 2020,
    "tags": [
      ["d", "<some unique id>"],
      ["i", "<external ID for the item being scored>"],
      ["score", "0.8"],
      ["raw_score", "8/10"]
      //.. repeat score/raw_score
    ],
    "content": "<Some optional comments about the score provided>"
}

staab commented 1 month ago

I wondered if perhaps this could be more simply defined as a "Score Kind".

See here for a draft of a more generic version of ratings I did a while back. This approach has mostly been rejected by the community. In nostr, more concrete is generally better.

Perhaps raw_rating instead of user.

I'll leave this out for now since I'm not sure how such a thing would best be designed, but feel free to add such a tag and we can spec it then.

RydalWater commented 1 month ago

I wondered if perhaps this could be more simply defined as a "Score Kind".

See here for a draft of a more generic version of ratings I did a while back. This approach has mostly been rejected by the community. In nostr, more concrete is generally better.

Thanks for this context it helps a lot.

Just to make sure I am clear on the next steps (tying to avoid overlap):

You're pushing ahead with the relay specific rating kind? If yes, I think you should probably remove the NIP51 updates you proposed, or update to be "Relay Ratings Set", because the current proposed NIP51 kind is generic and NIP85 is specific.
Pushing forward with changes to NIP73 (no comments here these don't affect my use case)
Given that I don't need relay ratings for my client (I want ratings with an i tag and them to be replaceable d, though these could be one and the same), I guess I'll make a separate proposal for another NIP which has a specific use-case (and kind) for the ratings I need along with an appropriate update the NIP51 to include a Ratings set?

staab commented 1 month ago

the current proposed NIP51 kind is generic

Now, this I think is probably ok. But I don't know what use cases you have in mind.

I guess I'll make a separate proposal for another NIP which has a specific use-case (and kind) for the ratings I need along with an appropriate update the NIP51 to include a Ratings set?

Sure, nothing wrong with a competing PR. I'm not really working on this right now, so for the foreseeable future coracle will be on the old reviews draft spec. My suggestion would be to go ahead and build your client, and publish your NIP afterwards, now that we've built some consensus on what reviews should look like. NIPs are meaningless except as a nexus for discussion until they've been implemented.

RydalWater commented 1 month ago

the current proposed NIP51 kind is generic

Now, this I think is probably ok. But I don't know what use cases you have in mind.

I'm looking to create list of review for books (hence the urge to include the i tag), I was hoping to get this added before my first deployment but I've pulled the trigger and will add the reviews in the next release (https://github.com/RydalWater/OpenLibrarian). This is currently up in test mode so all cache local and no events published to relays.

The concern I had with your NIP51 proposal was that it currently ties the list type specifically to NIP85 events (which are specific). That said if you think it is just an update to include other NIPs that is fine. Could always update as something like this A list of rating events (e.g., [NIP 85](./85.md)) that would make it so we don't need to immediately list other rating types.

Sure, nothing wrong with a competing PR. I'm not really working on this right now, so for the foreseeable future coracle will be on the old reviews draft spec. My suggestion would be to go ahead and build your client, and publish your NIP afterwards, now that we've built some consensus on what reviews should look like. NIPs are meaningless except as a nexus for discussion until they've been implemented.

Really do appreciate the back and forth on this discussion, it has been extremely useful. I'll mull over my implementation for the next couple of weeks and then keep you posted on how it goes.

Will go ahead and close this thread later today.

nostr-protocol / nips

NIP-XX Proposal for Reviews Kind #1515

About:

Proposal:

Structure:

Example Details:

Type-Tag Mapping:

Possible future tag ideas:

Comments:

879 needs to be updated in some important ways, but it would be great if we could settle on reviews that are backwards-compatible with that PR. I do think that "less is more" with NIPs, and there is a lot of unnecessary divergence here: