kadena-io / pact

The Pact Smart Contract Language
https://pact-language.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
580 stars 100 forks source link

add keccak256 native #1354

Closed chessai closed 4 months ago

chessai commented 4 months ago

Tests

Unit tests, Haskell:

Keccak256Spec keccak256 produces correct results [✔]

## Repl tests, Pact:
- `tests/pact/keccak256.repl`

❯ cabal run test:hspec -- --match="pact tests"

PactTestsSpec pact tests ... tests/pact/keccak256.repl [✔]

## Blocked on Pact 4.12 fork

❯ cabal run pact pact> (keccak256[""]) "xdJGAYb3IzySfn2y3McDwOUAtlPKgic7e/rYBF2FpHA=" pact> (env-exec-config ['DisablePact412]) ["DisablePact412"] pact> (keccak256[""])

:0:0:Error: Cannot resolve keccak256 pact> ``` # Gas Benchmark ## Gas Benchmark Code ```haskell import Test.Tasty.Bench main :: IO () main = do defaultMain [ bgroup "Keccak256" [ benchHash 10 , benchHash 100 , benchHash 1_000 , benchHash 10_000 , benchHash 100_000 , benchHash 1_000_000 ] ] benchHash :: Int -> Benchmark benchHash n = env (evaluate (force (inputsN n))) $ \inputs -> bgroup ("Keccak256 - " ++ show n ++ " bytes") [ bench "single blob" $ nf (keccak256 . V.singleton) (input inputs) , bench "chunked" $ nf keccak256 (chunked inputs) ] stringN :: Int -> Term Name stringN n = TLiteral { _tLiteral = LString (Text.decodeUtf8 (Base64.encode (BS.replicate n 97))) , _tInfo = def } data Inputs = Inputs { input :: !(Term Name) , chunked :: {-# unpack #-} !(Vector (Term Name)) } deriving stock (Generic) deriving anyclass (NFData) inputsN :: Int -> Inputs inputsN n = let !string = stringN n !chunk = stringN (n `div` chunkSize) in Inputs { input = string , chunked = V.replicate chunkSize chunk } chunkSize :: Int chunkSize = 10 ``` ## Gas Benchmark Results ``` All Keccak256 Keccak256 - 10 bytes single blob: OK 2.33 μs ± 223 ns chunked: OK 4.91 μs ± 208 ns Keccak256 - 100 bytes single blob: OK 2.36 μs ± 173 ns chunked: OK 5.18 μs ± 188 ns Keccak256 - 1000 bytes single blob: OK 5.70 μs ± 189 ns chunked: OK 8.67 μs ± 359 ns Keccak256 - 10000 bytes single blob: OK 37.8 μs ± 3.3 μs chunked: OK 42.4 μs ± 3.6 μs Keccak256 - 100000 bytes single blob: OK 369 μs ± 20 μs chunked: OK 367 μs ± 25 μs Keccak256 - 1000000 bytes single blob: OK 3.66 ms ± 242 μs chunked: OK 3.62 ms ± 357 μs ``` ## Findings Linear regression of single-blob points only: ![single_blob](https://github.com/kadena-io/pact/assets/18648043/d589822c-4b3a-4038-a751-f332ee4360da) Linear regression of chunked points only: ![chunks](https://github.com/kadena-io/pact/assets/18648043/e1f7c008-04ce-4bd5-8c43-91aefbee6e6f) From the single blob regression, the cost of `keccak256` could be computed (in milligas) roughly as: ``` 852.4 + 1.4632 * num_bytes(input) ``` From the chunked regression, the cost of `keccak256` could be computed (in milligas) roughly as: ``` 2119.6 + 1.446 * num_bytes(input) ``` This seems to suggest that for smaller inputs, the number of chunks matters more. This corroborates the story the benchmarks tell, as you can see that the difference between chunked and non-chunked input gets smaller as the total number of bytes grows larger. In practice, this convergence takes so long that we should charge per chunk. I propose that the cost be this, _per chunk_: ``` 2119.6 + 1.4632 * num_bytes(chunk) ``` To avoid rounding errors due to the small cost per bytes, we can cost this per 100 bytes: ``` 2120 + 146 * (num_bytes(chunk) `div` 100) ``` With ceiling to avoid chunks of 99 bytes going ungassed: ``` 2120 + 146 * ceiling (num_bytes(chunk) / 100) ``` And because `defaultGasTable` contains the base cost for a single call to `keccak256`, I believe the entry should be `1 gas`, because any call does *some* work (even `keccak256 []`), but the rest is gassed per chunk.
chessai commented 4 months ago

Cannot be merged until we remove the unsafe lens access

I thought the lens access was safe because I was under the impression that natives were typechecked, but I was wrong