Look through suggestions from user interview again and check what more of those can be applied

databu commented 2 years ago

We had conducted an extensive interview with a frequent Ocean Market user, which yielded these notes (copied from the ocean-utu Slack channel):

On whether to trust a data asset:

How many people/shares provider publisher vs. investors
Track record of publisher address — other data sets and their liquidity, maybe it was involved in some DeFi scam/fraud?
Only buyers (consumers) can provide real feedback (but a “pure” liquidity provider might not care about actual quality, just about the price)
Liquidity: did publisher withdraw (all) liquidity? — if a publisher does this repeatedly, they’ll be put in purgatory. But there are legitimate reasons for a publisher to withdraw liquidity, e.g. to free liquidity for a new data asset. If a data asset becomes out of date, it can (and should?) be removed from the market completely.
Track record liquidity providers — like early liquidity providers involved in pump + dump
Cannot see how many people purchase/provide liquidity, online total liquidity
Dataset static or dynamic — latter worth more, but personal opinion
Curation: how many is a publisher updating? — stale static data assets also loose relevance
Dynamic data assets don’t have to be manually updated, but require integration, therefore less attractive to fakers/cheater
References in description? — links to external sources? Is it attached to a platform/community?
Sample data set provided?
Certain publishers offer extensions, e.g. Chrome plugin like Swash, which collects data about users. Having contributed to a dataset might slightly increase trust, but there’s still a possibility that the rest of the data is trash/made up/whatever.

Other things:

datawhale.online has an app which provides some user feedback and “trust score”; Simon used it and finds it attractive
Early there were a lot of useless/fake datasets uploaded
There exist Ocean market forks: e.g. big data protocol

Other reasons someone would trust a dataset:

the description provided should be well curated, and at the very least relate to the data (obviously; but this is sometimes not the case)
a staker will look for identifiers such as high liquidity

More information that would be useful

“how many people have purchased a dataset”
“track record of the address that publishes an asset”
“track record of the people involved with an asset”

So there’s definitely a lot of possible ways to improve in future iterations. The general track record of addresses also across other protocols/d-apps is maybe one of the most interesting.

databu commented 2 years ago

Relating to the points above, including my overall take on each point:

On whether to trust a data asset:

How many people/shares provider publisher vs. investors

This can be another signal: ratio (or absolute numbers) of liquidity provided by the publisher address vs. other addresses. I suppose it's good to have a greater share of independent investors. But in that case it's quite easily game-able by the publisher because they can just provide liquidity from other addresses. This could be mitigated by requiring getting IDs from LPs, but I guess that's not on the roadmap, and maybe against Ocean's philosophy? 😕

Track record of publisher address — other data sets and their liquidity, maybe it was involved in some DeFi scam/fraud?

Already doing this ✔︎

Only buyers (consumers) can provide real feedback (but a “pure” liquidity provider might not care about actual quality, just about the price)

Active UTU feedback will be added later. We will need to make sure that the connected user indeed transacted. 🙂

Liquidity: did publisher withdraw (all) liquidity? — if a publisher does this repeatedly, they’ll be put in purgatory. But there are legitimate reasons for a publisher to withdraw liquidity, e.g. to free liquidity for a new data asset. If a data asset becomes out of date, it can (and should?) be removed from the market completely.

This concerns mostly the market app (purgatory, asset removal) itself. Not sure if UTU can add anything meaningful here. 😐

Track record liquidity providers — like early liquidity providers involved in pump + dump

Could add a signal "X LPs in this pool were previously in pump + dump schemes." A challenge might be how to identify those -- a too simple heuristic might produce false positives, but then maybe this doesn't matter? In general, this could be a new UTU signal type of "X known bad actors involved here", which might also be relevant in other use cases. 😐

Cannot see how many people purchase/provide liquidity, online total liquidity

Partially solved by the "X in your network provided liquidity". So this includes only "in your network" addresses. But again, in general different addresses doesn't mean different people, so this is difficult. So this could be tackled in a few ways:

a signal "X different addresses provided liquidity" -- easy enough 🙂
"X different people provided liquidity" -- hard, requires IDing LPs 😕
"At least X different people provided liquidity" -- consider only those LPs who voluntarily IDed themselves. 😐

Dataset static or dynamic — latter worth more, but personal opinion

Curation: how many is a publisher updating? — stale static data assets also loose relevance

Dynamic data assets don’t have to be manually updated, but require integration, therefore less attractive to fakers/cheater

These sound more like things to add to the market itself? Not sure if things like updating of the asset itself is visible from the subgraph or Aquarius APIs ❓

References in description? — links to external sources? Is it attached to a platform/community?

Sample data set provided?

While probably worthwhile signals, parsing and validating links or data sets isn't quite in UTU's scope. We should leave this one to other signal providers or the user.

Certain publishers offer extensions, e.g. Chrome plugin like Swash, which collects data about users. Having contributed to a dataset might slightly increase trust, but there’s still a possibility that the rest of the data is trash/made up/whatever.

Could we access that data, i.e. how many of the LP addresses took part in such data collection? What value would a signal like "X LPs in this pool helped collecting data for [this|another] publisher." ❓

databu commented 2 years ago

More information that would be useful

“how many people have purchased a dataset”

done ✔︎

“track record of the address that publishes an asset”

done ✔︎

“track record of the people involved with an asset”

See above point 5.

utu-protocol / market

Look through suggestions from user interview again and check what more of those can be applied #7