internet-sicherheit / ethereum-cache-creator

GNU General Public License v3.0
0 stars 0 forks source link

Serialize transaction data (JSON) and write to a file #3

Closed beeshot closed 4 years ago

beeshot commented 4 years ago

These are potentially interesting data to get (inspired by Chinese paper about networks):

However, for now it is enough to serialize the transaction data with the 2 address field to and from.

kiview commented 4 years ago

The easiest way to do this is by using Jackson. I think this Baeldung post is a pretty good intro.

You can even write directly to file from Jackson:

writer.writeValue(new File("D:/cp/dataTwo.json"), jsonDataObject);

So I recommend writing an array of JSON objects.

Mschnuff commented 4 years ago

ok very nice. already works for my testobject:

public void extrahiereJsonObject(int blockNumber) {
        BigInteger blockBigInteger = BigInteger.valueOf(blockNumber);
        try {
            EthBlock.Block ethBlock = client.getEthBlock(blockBigInteger).getBlock();
            BlockWithData bwd = new BlockWithData(ethBlock.getNumber(), ethBlock.getTransactions().size());
            ObjectMapper objectMapper = new ObjectMapper();
            objectMapper.writeValue(new File(OUTPUTDIRECTORY + "extractedData6.json"), bwd);
            System.out.println("something else: " + ethBlock.getTransactions());
        } catch (IOException e) {
            System.out.println("couldnt extract data: " + e);
        }
    } 

however i took a look into the EthBlock class: https://github.com/web3j/web3j/blob/master/core/src/main/java/org/web3j/protocol/core/methods/response/EthBlock.java i don't know much about Jackson, but to me it looks like Ethblock converts a json object into a java class structure. and then I revert it by using jackson again.

Mschnuff commented 4 years ago

ok, so looking into ethblock was kinda disappointing. It seems a lot of the stuff that is mentioned in the list above isnt even there. I will just extract everything i can get for now.

kiview commented 4 years ago

I think you need to infer indicators to determine what a block represents. You should find how to do this by looking at the implementation of other ehtereum explorers.

With regards to the JSON question: EthBlock probably deserializes itself from the JSON response of the JSON-RPC request. So yes, you then you serialize again if you have no way to obtain the original JSON response, but I don't see any problem with that.

kiview commented 4 years ago

As a first step, it is sufficient to serialize the Transaction objects in order to create a graph that contains generic addresses as edges and tuple of two addresses t(a1, a2) as edges.

So serializing a list of transactions is necessary here.

Mschnuff commented 4 years ago

this approach didnt work (we couldnt import it properly):

[{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9850711951a84ef8a2a31a7868d0dca34b0661ca"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x03274b235c4a9207db1c852ea145fbe4d05e0e89"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x03274b235c4a9207db1c852ea145fbe4d05e0e89"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x03274b235c4a9207db1c852ea145fbe4d05e0e89"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xd748bf41264b906093460923169643f45bdbc32e"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"}]
kiview commented 4 years ago

This format looks totally fine and would be the way to go.

In R, what you would have to do since you are probably getting a list of data frame when importing this, is merging all the data frames in the list into a single big data frame (this operation is what we would call a flatMap operation in functional programming). I can give you the R command for this if you need it.

I would recommend collaborating on the R file by also adding it to this repo (together with an example JSON file that can be used for testing), then I can do a PR to make R work with it.

Mschnuff commented 4 years ago

yeah we had several lists showing up in RStudio. so it is good to know that something like this flatmap operation exists, but i think i have the right structure now (first 100 blocks):

{"fromAddress":["0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","0xab59a1ea1ac9af9f77518b9b4ad80942ade35088"],"toAddress":[null,null,"0x9850711951a84ef8a2a31a7868d0dca34b0661ca"]}

actually everything is easier than expected. since jackson automatically converts my ArrayLists into json arrays

kiview commented 4 years ago

This looks like the wrong format now, you are tearing the touples apart by this. Please create a PR from a branch with your current working code, then we can discuss it there and make a code review.

The prior format was correct:

[
    {
        "fromAddress": "0xab59a1ea1ac9af9f77518b9b4ad80942ade35088",
        "toAddress": null
    },
    {
        "fromAddress": "0xab59a1ea1ac9af9f77518b9b4ad80942ade35088",
        "toAddress": null
    },
    {
        "fromAddress": "0xab59a1ea1ac9af9f77518b9b4ad80942ade35088",
        "toAddress": "0x9850711951a84ef8a2a31a7868d0dca34b0661ca"
    }
]

Flattening in R can be done with bind_rows() function.

Mschnuff commented 4 years ago

i built it according to this tutorial: https://www.tutorialspoint.com/r/r_json_files.htm. worked yesterday. dunno iam extrracting transactions form the first 100 000 blocks atm. when i am done i will try to import the new file into rstudio

kiview commented 4 years ago

@moekappels don't use the format described in the blog post, pre-shaping of data can always be done in data science tools. The proposed format is not a sementically structured and self-describing JSON format.

Use the format as specified in https://github.com/internet-sicherheit/ethereum-cache-creator/issues/3#issuecomment-621079395.

Also, please don't commit directly to master, instead work on a branch and create a PR. I will make master a protected branch in this repo in order to enforce and teach this workflow :wink:

kiview commented 4 years ago

@moekappels This is basically already done and on master, isn't it? If yes, we can close the issue.

Mschnuff commented 4 years ago

yes but we still focus on the addresses. i am fine with closing so there is less clutter in the issues.

Mschnuff commented 4 years ago

we took a long detour with the json files and ended up where we started.

kiview commented 4 years ago

Alright, we close once we created issues for the outstanding tap mentioned in the top post.

kiview commented 4 years ago

I close for know to clean up our issue list. New issues should be created for specific fields, e.g. like #29.