senior-design-crowd / crowdify

A Distributed Backup Solution
2 stars 4 forks source link

Store files in IMP network #39

Open mffigueroa opened 8 years ago

mffigueroa commented 8 years ago

The first step in storing files in the network is finding a suitable function to map the hash values (currently from the Sha256 algorithm) into a 2D coordinate space (specifically [0,1] x [0,1]). Since Sha256 hashes are 256-bits, I know I would need a nice big num library to use. I settled on the tried and tested GNU MP library, known for its speed.

I'm currently experimenting with different functions to map the hash values. My initial idea was to take the high and low 128-bits of the hash and make them into x and y coordinates in [0, 1] like so: x = low / (low + high) y = high / (low + high)

To test this idea, I wrote a short code snippet to parse the hashes into big nums and generate these coordinates. Then I wrote a Python script to generate random 4k blocks, hash them, and output a long text file of the C-string representations of the hashes. I then wrote C++ code to generate a Python script that would plot the results using Matplotlib.

I'm quite embarrassed to say that I'm a terrible mathematician :P, because this is the result of plotting the x and y coordinates representing 10,000 random 4k blocks:

figure_2

As you can see, I've taken great random structure and made it very very structured which was the complete opposite of my goal haha.

mffigueroa commented 8 years ago

I just realized, x + y = (low / (low + high)) + (high / (low + high)) = 1. So y = 1 - x. Woops. I need to create a function where the x and y coordinates are completely uncorrelated. When I get up tomorrow I'm going to try the very simple: (low / max-128bit-int, high / max-128bit-int)

mffigueroa commented 8 years ago

Ok, I tried the above formula:

x = low / max-128bit-int y = high / max-128bit-int

Result: figure_1

Wins :)