[EPIC] Mitigate rug pulls

trentmc commented 4 years ago

Multi-prong strategy, as below.

This epic can be changed from 'High' to 'Medium' priority once all high-priority issues are resolved.

Done

[x] Update staking blog post to further clarify rug pulls [Trent]
[x] In oceanprotocol.com/earn, make the risks more clear [Trent]
[x] In oceanprotocol.com “marketplaces” page, make the risks more clear [Trent]
[x] Link to Ocean Market’s terms from oceanprotocol.com [Trent]
[x] Change the % weight of Ocean that initial publishers must put in, from {90% DT, 10% OCEAN} to {50% DT, 50% OCEAN}. This better aligns the incentives of the publisher with the rest of the community (that owns OCEAN). Where to change: just Market, not libraries or contracts. Side benefit: keeps prices sane.
[x] In Ocean Market, display the publisher’s Eth address. Market#171.
[x] When people go to add liquidity, have a warning popup/banner. “Use at your own risk.” Here are risks. Terms of Use.” Market#172.
[x] In Ocean Market pools page, display: cap, total supply, % pool ownership distribution (highlight the % that publisher owns). These all help stakers assess the risk of a rug pull, and the effects. Market#173.
[x] Link Ocean Market’s terms visible at bottom of market.oceanprotocol.com. Market#170 [Matthias]
[x] Publisher/owner search. Market-187. Why: drives publisher reputation. Note: also in "IP Violation" Epic.
[x] Datapartners list. Market-189
[x] All new pools created from Gui are 70-30 OCEAN-DT (or even higher). Market#207
[x] Purgatory list (IP violation, sensitive data). Market-229. Note: also in "IP Violation" Epic.
[x] Statistics to help assess rug pull risk
- [x] Simple graph of liquidity history for a single data asset. Market issue: FIXME. Aquarius#307.
- [x] In pool stats, show publisher's initial investment of OCEAN into the pool. Market issue: FIXME. Aquarius#308.
- [x] In pool stats, show how much publisher has pulled out of pool. Market issue: FIXME. Aquarius#309.
[x] Help publisher credibility by linking Eth address to 3Box (which links to other profiles). Market-240. Note: also in "IP Violation" epic [Mihai]

In Progress

Agreed-upon backlog [empty]

Possible tactics, to discuss more, only add to backlog if agreed

Help users' mental model via better text when staking. Market-210
Custom badges for accounts in Ocean Market. Market-252
Retire datapartners list (when the time is right)
Quick-and-dirty timelock feature. What would that look like? ..
Tweet published dataset. Market#174. Bonus: helps virality further. But doesn't help on rug pulls beyond what 3box does
Help publisher credibility by linking Eth address to ENS Market-188. Note: Note: also in "Rug Pull" Epic.
Cap the staking amount, for a given Eth address, to 10% of a pool. (Only from GUI) (Can only do after initial publish of course). Market#175.
Limit the amount of Ocean tokens staked to let's say 10x original Ocean staked by the publisher. Once we reach that, add liquidity should be disabled
Data Whale's idea on conditional liquidity withdrawals
Migrate existing publishers from 10-90 OCEAN-DT pools to 70-30 (or higher) pools. PrivateIssues#1
Positive-label lists (Made pledge, Data launch partner, etc). Market-228. Note: also in "IP Violation" Epic.
Integrate Balancer v2, and leverage its time-lock feature. What: if you're the publisher, it's a 2 day delay between when you initiate removing liquidity, and when it exits. In the meantime, other LPs can exit.
Better IDOs, e.g. vest tokens to publisher over time. This gives the publisher less to dump at once. Zenhub epic

kremalicious commented 4 years ago

did we look into timelock contracts? So that we could have a lock period for creator's added liquidity

trentmc commented 4 years ago

Great point. Timelocks are a great solution. They require smart contract changes.

Timelocks are at the heart of Balancer V2, which comes out in Q1 2021. I suggest put in timelocks as part of the Ocean upgrade to Balancer V2.

I updated the description to include this, for Balancer V2.

brucepon commented 4 years ago

Can we give the community access to these issues?

trentmc commented 4 years ago

It was meant to be open already. Rectified.

keeno12 commented 4 years ago

Problem 1: An issue I see is also verifying of publishers that could bring more trust or at least give greater transparency to good actors of the market.

Solutions: 1) Twitter and/or Keybase verification linking to profiles 2) Website link out to the publisher website

Problem 2: Determining verified partners / data providers and giving them more attention than users jumping pools to lose their funds. Currently, users must do a lot of research on their own and look for signals from the core team on Twitter such as likes, comments, follows etc

Solutions: 1) Update info page to show verified data provider by the core team 2) Add another section on the main page "verified pools"

trentmc commented 4 years ago

@keen012 thanks for the thoughts. Both ideas were listed in the IP rights violation epic but not here.

They are:

Data partners whitelist market#189
Help publisher credibility by linking Eth address to other online profiles. market#188

For thoroughness, I've now listed them here as well.

TimDaub commented 4 years ago

👋 Saw this while browsing around.

I'm interested, what exactly is a "rug pull"? Can you maybe link me to a description? Thanks!

brucepon commented 4 years ago

👋 Saw this while browsing around.

I'm interested, what exactly is a "rug pull"? Can you maybe link me to a description? Thanks!

Explanation Links: https://medium.com/@unishield/how-uniswap-scams-work-ba847275a49f https://coingape.com/solving-the-rug-pull-liquidity-problem-on-uniswap-dex-after-the-sushi-debacle/ https://www.reddit.com/r/ethereum/comments/j3ba7g/rug_pulls_on_uniswap_how_they_work/

TimDaub commented 4 years ago

Thanks for the links. Interesting problem. If I had to describe it in my own word, then "pulling a rug" is when a malicious user creates their own liquidity pool with an automated market maker (AAM) and then lures other liquidity providers or buyers/sellers into using the pool. To formulate the act of rug-pulling explicitly: It's the act of removing a significant portion of liquidity from an AAM pool such as the goal of profiting from the subsequently-skewed values of the pool. Examples of a "rug-pullable" pools are e.g.:

An ETH-MALICIOUS trading pair on Uniswap where the MALICIOUS token is practically of zero value as the malicious user just created it out of thin air to deceive buyers/sellers or liquidity providers (e.g. if someone created ETH-0CEAN (with a "0" instead of an "O" and then put 0CEAN in a uniswap pool with e.g. a 1ETH:1 0CEAN price).
A pool that has bad liquidity, where bad liquidity can be: (a) only a few whales hold provide liquidity (which means they can collude), or (b) the pool's overall liquidity and/or the number of liquidity providers is so low that a single whale can easily influence the pool's price to their advantage. That attack would allow the whale to drain the other liquidity provider's pool-stake by buying at an "advantageous price"

IMO, between example 1. and 2. there are differences.

is impersonating/faking
is essentially the "empty network" problem

I think for both a sufficent solution could be found by introducing a decentralized scoring mechanism representing the "trustworthiness" of the pool. For this to work, however, a resiliant but publicly observable metric would have to be found.

An example of such a metric is the concept of "bitcoin days destroyed" of the Bitcoin network. So why not use something similar. Let's start with a criteria. A trustworthy pool is one that has many individual holders (e.g. relatively to other pools) who ideally all hold an equally-sized portion of the pool.

A few further ideas to create this score without diving too deep:

We could use the gini coefficient to measure the overall (in)equality of liquidity providers and their stake in the pool
Given that there's no practical solution to sock puppets on blockchain, we could factor in the "amount of ether days destroyed" as a "weight" within the above-stated equality measurement. I'm not exactly sure how to measure the ether days destroyed yet. It could either be the number of days the ether has been pooled (imagine you have to pool $3k of ETH for 30 days before scaming someone for $100. That it's too annoying! That money could have been put into other use so much better). Or the amount of days the ether hasn't been spent before pooling, etc. I'm sure, however, we'd find a good enough "constraint" or PoW that would make it too time-consuming/annoying/capital-inefficient for a malicious user to perform.

Ultimately, having this measure would allow pools to be ranked on the market website making it very unlikely for a user to interact with a malicious pool (e.g. similar to how it's unlikely to be scammed by AirBNB/UBER/Ebay providers).

Anyways, just a few ideas from my side :D Curious to see where this goes!

brucepon commented 4 years ago

Always appreciate your thinking and insights Tim.

You have captured the essence of the problem.

Some other solutions include:

Staking by the publisher which is forfeited in case of malicious action
Reputation ratings on publishers based on past behaviour
Pre-release of tokens at a fixed price to increase the token holders, prior to release on an AMM (a part of the Gini metric). This gives the publisher an initial treasury, while reducing the incentive to rug pull and concurrently reducing the impact of a rug pull if it does happen. Essentially an IPO for a dataset.

TimDaub commented 4 years ago

Cool, I did a small MVP: https://docs.google.com/spreadsheets/d/1AONZoxfiXXm16bdn1q_GrrhwfZTERuh_nLjP2AqpKAw/edit?usp=sharing

EDIT: Sorry link sharing is fixed now I think

Comments:

I noticed that for most token contracts there's one large token holder. That's something system-specific I'm not sure how to interpret. Should I remove that address before calculating the gini index? Why is it like that?
I used this formula for gini coefficient calc: https://en.wikipedia.org/w/index.php?title=Gini_coefficient&oldid=985147212#Definition
I calculated the gini coefficient over all token holders and their balances and NOT over the pool's liquidity providers (that data is more difficult to get). I don't think that it's an issue just yet.
I didn't factor in any capital-efficienty metric to stop sock puppets from being sockpuppets yet
by checking some data sets, it seems to go in the right direction. Tho I'm absolutely not sold on the validity of the calculation yet

Feedback appreciated :)

Few more thoughts:

creating a resilient score would allow to build a ocean protocol data set index
furthermore, this data set itself could of be priced and sold on the ocean market too

TimDaub commented 4 years ago

Hey,

so actually I hacked as fast as I could to get something rolled out. Proud to announce: https://rugpullindex.com/ It's using ideas out of the above-mentioned comments (gini coefficient etc).

Of course, I'm hoping that people are gonna use it, as this will keep me motivated to continue working on it. Stats are here: https://plausible.io/rugpullindex.com

brucepon commented 4 years ago

Really nice. Suggestions:

Use TASLOB-45 as the primary key
Would be good to cross-reference this with the @RealDataWhale dataset. If this is indicative, @RealDataWhale gives the actual, then a combination could give a "predictive" aspect.

brucepon commented 4 years ago

By TASLOB-45, I meant the 6-2 character shortname.

TimDaub commented 4 years ago

1. Use TASLOB-45 as the primary key

Hey, good point. The scoring currently doesn't factor in the number of unique holders. Checking for its score:

SELECT score FROM sets WHERE symbol="TASLOB-45"
> 0.841469360480562

But actually, TASLOB-45 has tons or seemingly legit users: https://etherscan.io/token/0x2655b8a7357f4bb4a8cb2170e196096ac8f0cdf9#balances So next week, I'll have to think of how to improve the index such that TASLOB-45 gets listed. Thanks for the feedback

Happy to share notes with @realdatawhale

realdatawhale commented 4 years ago

Hi everyone!

Hope you are well and great effort on the index!

If it helps, we'd be happy to provide you with access to the Directory, analyzing each dataset and its legitimacy. You may cross-reference your approach with the actualization of rug-pulls etc. Let us know if there's anything else you need to make this work even better.

TimDaub commented 4 years ago

Small update from my site:

rugpullindex.com's data is now crawled once a day, meaning there are daily updates
It now uses the purgatory list for assets. Purgatory accounts are still used when calculating the score
I'm logging all changes here: https://rugpullindex.com/changelog.txt
Site now scales nicely on mobile

On the biz dev side:

Got in touch with @realdatawhale and got access to his data sheet. Planning to use it for improving the scoring algorithm soon.

Edit: Also, I'm now only showing data sets that have min(35) liquidity providers. Which interestingly produces already quite interesting results if you ask me (e.g. TASLOB-45 at rank 6). Though all decentralization scores are quite bad still. But maybe this will even out over time.

TimDaub commented 4 years ago

Small update today:

For each data set, a piechart of the pool LPs is now available by clicking the "Chart" link.

kremalicious commented 4 years ago

oh look, it's @TimDaub casually dropping an amazing project in some GitHub comments 👋 Love the idea of somehow rationalizing a "rug pull".

I can see how this could be used on the Pool widget in market, represented and mapped to some color-graded indicator. Some tooltip or help text could then link to your site for further explanations.

For that, an API endpoint would be handy, like:

https://rugpullindex.com/api/<DATATOKENADDRESS>

which could return:

{
  "score": 0.73
}

Also worth noting that, technically from the contracts perspective, one datatoken could be in multiple pools since there could be multiple pools for one data set. This is of course crazy confusing, which is why we default in all flows enforced through the UI to the first pool created with a respective datatoken, effectively creating the connection data set === datatoken === poolAddress, making it way easier to handle in terms of user's cognitive load. While this is something to keep in mind for the future, it might imply right now to rather use the pool address for a possible API parameter instead of datatoken address.

If we decide on using this score somehow, we can also move your app or only the API behind our infrastructure if needed. Cause unless you want to test your DDoS capabilities, you do not want to receive API requests from the live market right now

TimDaub commented 4 years ago

oh look, it's @TimDaub casually dropping an amazing project in some GitHub comments 👋 Love the idea of somehow rationalizing a "rug pull".

Thanks 😊

I can see how this could be used on the Pool widget in market, represented and mapped to some color-graded indicator. Some tooltip or help text could then link to your site for further explanations.

Yeah, that was my idea too.

For that, an API endpoint would be handy, like https://rugpullindex.com/api/

Makes total sense. That's definitely something I can deliver over the next few weeks.

Also worth noting that, technically from the perspective of the contract, one data token could be in multiple pools since there could be multiple pools for one data set.

Mhh 🤔 Not sure I'm following. I'm familiar with the fact that for a balancer AMM, more than two tokens can be in one pool when for each token "the pool weight" is less than 50%. Is that what you mean?

This is of course crazy confusing, which is why we default in all flows enforced through the UI to the first pool created with a respective datatoken, effectively creating the connection data set === datatoken === poolAddress, making it way easier to handle in terms of user's cognitive load. While this is something to keep in mind for the future, it might imply right now to rather use the pool address for a possible API parameter instead of datatoken address.

What'd help is a specification/document of how Ocean currently works. Can you link me to something like that?

If we decide on using this score somehow, we can also move your app or only the API behind our infrastructure if needed. Cause unless you want to test your DDoS capabilities, you do not want to receive API requests from the live market right now

Hahah, challenge accepted! I'll probably implement caching soon.

Thanks for the feedback @kremalicious!

Edit: Unfortunately, rugpullindex.com was down this night. I'm still fighting with the stability of my cronjob. But I think I'm close to fixing and improving the site's reliability.

TimDaub commented 3 years ago

Hey everyone,

today I've added a 1 day delta row to the list. You can now see how a data sets rank changed in comparison to yesterday. I've sent a round a few emails, asking for more user feedback to rank which feature is desired the most. I'll try to find time for all of those features to implement in the upcoming weeks.

Best, Tim

TimDaub commented 3 years ago

Hey 👋

just a heads up that I'm still thinking on how to improve the index. Today, I put down some thoughts into my super minimalistic rugpullindex blog (it's a .txt file lol). You can read them here: https://rugpullindex.com/changelog.txt

Best, Tim

TimDaub commented 3 years ago

Nonetheless, the pool's publisher stake should experience a higher "weighting" as the lower the publisher's share, the better - meaning that there's a greater distribution of shares.

Agreed. Actually, I'm not totally aware of how data sets are sold initially as for now. Is the publisher uploading their data set and puts an amount of OCEAN that ultimately becomes the initial price of a data set? If so, maybe that's not an ideal strategy to price a data set in the beginning.

Martin Köppelmann has once wrote about "Initial Uniswap Offerings": https://twitter.com/koeppelmann/status/1256201034046885890 and I believe Gnosis has done lots of work on auctions. I know that the dutchX can be used to price and sell off assets. In any case: making sure that a publisher's shares are spread more evenly should be implemented into the Ocean Protocol. A data set that can achieve a fair pricing with many participants will be treated with privilege by rugpullindex.com's rating algorithm.

We are starting to work on an application with a group of React Native developers, it would be great if you could send us the link to rugpullindex APIs again. We will implement your scores on our application (if you dont mind).

Oh cool! Sounds amazing. I'll send you an extra email for that.

TimDaub commented 3 years ago

I've added a Cache-Control header and my reverse proxy is allowed to cache now too. Means, page speeds should now be significantly improved. According to my non-scientific measurements they went down from 900ms to roughly 300ms. And since the actual server is only asked for data once a day (to put latest crawl into cache), the site should be pretty scalable now to. At least as much as a 2€ hetzner instance with nginx is scaleable :)

Fingers crossed that the cache-invalidation for tonight works as expected.

TimDaub commented 3 years ago

Regarding my last update about Cache-Control. It ended up working well and so now rugpullindex.com should be easily able to scale, as its reverse proxy is delivering a single html to all its users.

But regarding my main update: For the past week, I've been thinking a lot about an improvement to rugpullindex.com's current scoring method. I think that today I've made a key discovery about including liquidity into the ranking. I wrote about it in my minimal blog over here: https://rugpullindex.com/changelog.txt

Additionally, and this might be interesting to you @kremalicious and @realdatawhale, I'll soon be starting on implementing several API endpoints. I'll keep you posted about the progress.

TimDaub commented 3 years ago

Update:

New scoring method is online. It's a mix between liquidity and the gini coefficient. I'll write a detailed summary of it soon.

Also: I've changed the page's copy writing a bit.

trentmc commented 3 years ago

This is really great, Tim, keep it coming:)

There's also funding available from OceanDAO and more, we encourage you to go for it:) www.oceanprotocol.com/fund

TimDaub commented 3 years ago

Another update (maybe interesting for @kremalicious):

Added a REST API for authorized customers (there's one currently)
Not cached currently (slow)

GET https:///rugpullindex.com/api/v1/indices/OP-COMPOSITE-V1/assets
> returns all assets sorted (like website)

GET https:///rugpullindex.com/api/v1/indices/OP-COMPOSITE-V1/assets/did:op:0c3d9e5Df48F2917EE3eB452791740A96cB382A6?date=ISO8601DateString
> {"rank":35,"symbol":"MARCUT-0","score":0.044855145528774946,"gini":0.9119733464150174,"lastCrawl":"2020-12-08T23:01:03.313Z","price":71.54826650977905,"a
ddress":"0x31369EA0a323903493f715d4e44081a64D3b77dA","did":"did:op:0c3d9e5Df48F2917EE3eB452791740A96cB382A6","liquidity":1170.9040416102937,"banned":0}

To request API access, write me to tim@daubenschuetz.de or comment here.

There's also funding available from OceanDAO and more, we encourage you to go for it:) www.oceanprotocol.com/fund

Thanks, I'll give it a look.

TimDaub commented 3 years ago

More updates:

Launch blog post from last week: https://timdaub.github.io/2020/12/11/rugpullindex/

Also an update from today that riffs on @kremalicious's idea of a rugpullindex.com score on the official Ocean Marketplace: Introducing rugpullindex.com badges for Markdown:

[![rugpullindex.com rank](https://img.shields.io/badge/dynamic/json?url=https://rugpullindex.com/api/v1/indices/OP-COMPOSITE-V1/ranks/did:op:7Bce67697eD2858d0683c631DdE7Af823b7eea38&label=rugpullindex.com&query=rank&color=blue&prefix=%23)](https://rugpullindex.com)

For more information, visit: https://rugpullindex.com/#faq

oceanprotocol / pm

[EPIC] Mitigate rug pulls #30