launchzone / chain-backend

0 stars 0 forks source link

SwapX: improve the `route finding` algorithm #5

Closed Zergity closed 1 year ago

Zergity commented 3 years ago

Some research directions

kevin-leptons commented 3 years ago

This is a table of contents for convenient tracking.

Zergity commented 3 years ago
  • Build a function to find the best way to swap token A to token B y = f(S, R). That mean swaping return maximum amount of token B from fixed amount of token A.

there's also transaction fee, so the goal is not only return max amount but return max (amount - fee)

Zergity commented 3 years ago

Using Longest Path Problem on graph in progress.

Longest Path of Shortest Path?

kevin-leptons commented 3 years ago
  • Build a function to find the best way to swap token A to token B y = f(S, R). That mean swaping return maximum amount of token B from fixed amount of token A.

there's also transaction fee, so the goal is not only return max amount but return max (amount - fee)

I updated to give more details about amount of target token. It is final amount which is received by user. Details of swapping include exchange rate and exchange fee will be describe in specific solution.

Using Longest Path Problem on graph in progress.

Longest Path of Shortest Path?

It is longest path, not shortest path. That way seem not reach good enough time-complexity to go as above descriptions. I do work on second solution. If there are any suggestions then please post here, I will investigate on it.

kevin-leptons commented 3 years ago

Problem Statement

kevin-leptons commented 3 years ago

The First Solution

This solution base on article Longest Path Problem from graph theory. By this way, it build a graph G = {V, E} where V is set of tokens as vertices, E is set of edges as the way to swap between two tokens. Each edges has it's weight, including but not limit: exchange rate, exhange fee. Then apply a Finding Longest Path Algorithm on G return a path to exchange tokens which give maximum amount of target token.

First step, build a graph G = {V, E} from set of tokens S as descriptions above. Each vertices has it's own identity as token address. Each edges from vertex A to B is the way to swap token A to token B, it is represent by function y = g(a) where a is amount of token A and y is received amount of token B. That function can be build from S = {s1... s_n} where s_i has ratio of token A and token B that represent for exchange rate. There are exhange fee should be include to function g(). This graph is directed graph.

Second step, peform a Finding Longest Path Algorithm on G to find out a exchange path. Follow this path swapping give maximum amount of received token.

Unfortunately, Longest Path Algorithms is not trivial, it can not reach linear, logarithmic or polylogarithmic time complexity. Although algorithms has good enough time complexity on few special graphs such as block graphs, split graphs... it's not possible on this case because G is general directed graph. And this solution give only one path, mean y = {p_1} that can not optimize for multi swap providers. It is impossible to pre-compute edge's weight because it change overtime, so there are no potentials to sort and query effectively.

In fact, Longest Path Problem is not similar to Shortest Path Problem which is can be solve with good enough time complexity. In simple, Longest Path Algorithms must find out all paths then select longest one, so this is not a way to go.

kevin-leptons commented 3 years ago

The Second Solution

This is an improvement of current swapx solution. Current swapx define fixed swap paths that go through common tokens such as USDT, BUSD WBNB. For example, swap request token A to B then there are paths A -> USDT -> B, A -> WBNB -> BUSD -> B and so on. Then it evaluates and pick up the best path that returns maximum amount of token B. For more details, see swapx interface and swapx contract.

First step, define set of common tokens K, those tokens should be token which is pair with many other tokens. At the time this document is write K = {USDT, BUSD, WBNB, BSCX}. The set K can be modify depend on implementation.

Second step, generate raw set of swap paths T = {p_1... p_n}; p_i = [t_1... t_m], where p_i is a path, t_i is a token. That can be done easily from swap request R = {s, d, a} and set of common token K. For example, p_1 = [s, USDT, D]; p_2 = [s, USDT, WBNB, d] and so on.

Third step, make final set of swap paths P = {p_1... p_h} by filter not existed path from T = {p1... p_n}; p_i = [t_1... t_m]. A path is not existed if and only if token pair {t_i, t_(i+1)} does not exists from set of token pair states S.

Four step, create a set for amount of received target token M = {m_1... m_h}, where m_i = g(p_i), p_i is a element of P. m_i = {p_i, a} where a is amount of received target token from path p_i. The function g() should return value that depend on exchange rate of two tokens and swap fee.

Fifth step, pick a path that receive the most amount of target token from M = {m_1... m_h},called m_i. Return [m_i] as result.

By this way, time complexity is (P(1, k) + P(2, k) + ... + P(k, k)).log_2(N) where P is permutation, k is number of common tokens K, N is number of token pair states S.

However, the result has one swap path at most. And it can not find other paths than the paths that go though common tokens.

Zergity commented 3 years ago

The First Solution

This solution base on article Longest Path Problem

Why would you want to find the longest path in this case? Shouldn't you want to find the shortest path?

Zergity commented 3 years ago

One more comment from me is that the results could be multiple paths, not just single path. And the weight of each path is dynamic (depend on the input amount), not static.

kevin-leptons commented 3 years ago

Why would you want to find the longest path in this case? Shouldn't you want to find the shortest path?

The main reason is weight of edge X -> Y represents for amount of token Y after swapping of token X to token Y. Since we expect to get maximum amount of target token, we can do it by find a longest path from source token to target token. If we find a shortest path then it mean amount of target token is minimum.

One more comment from me is that the results could be multiple paths, not just single path. And the weight of each path is dynamic (depend on the input amount), not static.

I am aware the result could be more than a path. Unfortunately, Longest Path Algorithms return one path at most, can be zero if there is no path. However, final result can be a set contains one path to match with Problem Statement easily. I did describe few limitations in 4rd paragraph.

Edge's weight is dynamic and it is calculate by function g() from 2nd paragraph. I also point few limitations in 4th paragraph.

Zergity commented 3 years ago

Why would you want to find the longest path in this case? Shouldn't you want to find the shortest path?

The main reason is weight of edge X -> Y represents for amount of token Y after swapping of token X to token Y. Since we expect to get maximum amount of target token, we can do it by find a longest path from source token to target token. If we find a shortest path then it mean amount of target token is minimum.

I think you made a critical mistake identify the problem, please read again about the shortest and longest path problems.

kevin-leptons commented 3 years ago

These notes come from while working with Heuristic to find out a solution.

Note 1: Simplify problem

Since original problem statement is y = f(S, R), there are too many routing cases to build and evaluate. Especially, it requires heavy data loading that causes overloading, for more details see Note 2.

To coop this issue, simplify problem is a way to go. The given result could not be the best but excecution time is acceptable. There are two rules that simplify problem:

First rule, work with set of exchange providers V = {v_1... v_n}, where v_i is popular provider such as PancakeSwap, BakerySwap, ApeSwap... Of course, there are other providers but common ones does: 1. Support almost tokens; 2. The liquidity take up the majority of the network; 3. Public their contract or ABIs, that mean related data can be collect. In other hands, exchange on well-know providers reduces risks which is hard to detect and eliminate by machine.

Second rule, work with set of intermediate tokens K = {k_1... k_n}, where k_i is popular token such as USDT, BUSD, WBNB. The reasons to go this way is similar like the first rule.

Both rules above is use by swapx currently and it is easy to do. This note define it and it's benefits to apply in specific solutions.

Note 2: Heavy data loading

For now, any solutions requires heavy data loading to resolve problem. This note shows that overloading occurs on data loading instead of data processing.

From problem statement y = f(S, R), split it to two sequential problems y = f(S, R) <=> U = g(S, R) then y = h(U, R), where g() is function to load token states, U is related token states from S, h() is function to find out exchange paths. By this way, there are many actions to do in internal function h(): 1. Combination between popular providers and tokens give several dozen states. 2. Each states is index by it's address; 3. Each states is load one by one; 4. Time-complexity to load a state is log_2(N), this is common with almost database.

If there is a solution to build suitable data structure S to reduce time-complexity of token states loading then it is helpful. For example, all data is loaded with time-complexity log_2(N) then evaluation all exchange paths is trivial, no matter algorithms is.

Note 3: Blind searching

This situation occurs while apply partial data loading, search for exchange paths step by step, each step pick the best way to go. It can be describe by following statements:

kevin-leptons commented 3 years ago

Why would you want to find the longest path in this case? Shouldn't you want to find the shortest path?

The main reason is weight of edge X -> Y represents for amount of token Y after swapping of token X to token Y. Since we expect to get maximum amount of target token, we can do it by find a longest path from source token to target token. If we find a shortest path then it mean amount of target token is minimum.

I think you made a critical mistake identify the problem, please read again about the shortest and longest path problems.

Right, that solution is incorrect.

Zergity commented 3 years ago

Note 2: Heavy data loading

For now, any solutions requires heavy data loading to resolve problem. This note shows that overloading occurs on data loading instead of data processing.

Which data is load, and from where?