renepickhardt / pickhardtpayments

python package to showcase, test and build your own version of Pickhardt Payments
Apache License 2.0
42 stars 14 forks source link

Remove magic numbers #7

Open renepickhardt opened 2 years ago

renepickhardt commented 2 years ago

Currently we have magic numbers in the code. For example:

https://github.com/renepickhardt/pickhardtpayments/blob/main/pickhardtpayments/UncertaintyChannel.py#L32 specifies MAX_CHANNEL_SIZE = 15_000_000_000 # 150 BTC

this is completely arbitrary and fits the current Lightning Network but chaning this may be necessary in the future but has an impact on the learnt weights in feature engineering.

I guess this is also related to feature engineering in general

joshfix commented 2 years ago

Hi Rene! Speaking of magic numbers, can you explain where the value of 1_400_000_000 comes from in this equation?

uncertaint_cost_feature = log(1_400_000_000/channel_size)

https://github.com/renepickhardt/mpp-splitter/issues/12#issuecomment-1105196895

renepickhardt commented 2 years ago

I hope the Glossary in the notebook of https://github.com/renepickhardt/mpp-splitter/blob/pickhardt-payments-simulation-dev/Pickhardt-Payments-Simulation.ipynb would be helpful to give the answers (I plan to migrate this to a glossary file in this repo). The reason behind that number explained in this notebook: https://github.com/renepickhardt/mpp-splitter/blob/master/Minimal%20Linearized%20min%20cost%20flow%20example%20for%20MPP.ipynb

TL;DR:

the uncertainty cost uc(a) to send an amount a is defined as -log((c-a+a)/(c+a)) This be linearized as a/c which can be written as 1/c*a. So from that we derive the unit_uncertainty_cost to be 1/c. The problem is that min cost flow solvers use techniques from integer programming and thus need integer cost. Therfor I multiply all unit_uncertainty_cost with the largest capacity that i observed on the network which to get linearized_uncertainty_integer_unit_cost (or some other permuatation fo those words) as 1_400_000_000/channel_size.

Note that the additional log has no meaning and was just there to scale the feature for better visual representation to an logarithmic scale (as I did with the ppm which is the linearized_routing_integer_unit_cost).

The problem of this issue

Of course I could easily alsways take the currently max observed channel size but the problem is that my choice of \mu in feature engineering to include linearized_routing_integer_unit_cost might not work next time when this channel is closed or when we have a much larger one. Thus I tried to use 21M BTC. however if that in satoshi is my scaling factor wit reasonable sized payment amounts I get integer overflows. Thus I will need something smarter