Closed beeshot closed 4 years ago
The easiest way to do this is by using Jackson. I think this Baeldung post is a pretty good intro.
You can even write directly to file from Jackson:
writer.writeValue(new File("D:/cp/dataTwo.json"), jsonDataObject);
So I recommend writing an array of JSON objects.
ok very nice. already works for my testobject:
public void extrahiereJsonObject(int blockNumber) {
BigInteger blockBigInteger = BigInteger.valueOf(blockNumber);
try {
EthBlock.Block ethBlock = client.getEthBlock(blockBigInteger).getBlock();
BlockWithData bwd = new BlockWithData(ethBlock.getNumber(), ethBlock.getTransactions().size());
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.writeValue(new File(OUTPUTDIRECTORY + "extractedData6.json"), bwd);
System.out.println("something else: " + ethBlock.getTransactions());
} catch (IOException e) {
System.out.println("couldnt extract data: " + e);
}
}
however i took a look into the EthBlock class: https://github.com/web3j/web3j/blob/master/core/src/main/java/org/web3j/protocol/core/methods/response/EthBlock.java i don't know much about Jackson, but to me it looks like Ethblock converts a json object into a java class structure. and then I revert it by using jackson again.
ok, so looking into ethblock was kinda disappointing. It seems a lot of the stuff that is mentioned in the list above isnt even there. I will just extract everything i can get for now.
I think you need to infer indicators to determine what a block represents. You should find how to do this by looking at the implementation of other ehtereum explorers.
With regards to the JSON question:
EthBlock
probably deserializes itself from the JSON response of the JSON-RPC request. So yes, you then you serialize again if you have no way to obtain the original JSON response, but I don't see any problem with that.
As a first step, it is sufficient to serialize the Transaction objects in order to create a graph that contains generic addresses as edges and tuple of two addresses t(a1, a2) as edges.
So serializing a list of transactions is necessary here.
this approach didnt work (we couldnt import it properly):
[{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9850711951a84ef8a2a31a7868d0dca34b0661ca"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x03274b235c4a9207db1c852ea145fbe4d05e0e89"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x03274b235c4a9207db1c852ea145fbe4d05e0e89"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":null},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x03274b235c4a9207db1c852ea145fbe4d05e0e89"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xd748bf41264b906093460923169643f45bdbc32e"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xd748bf41264b906093460923169643f45bdbc32e","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0x9849379b89ab24c18c8871d56d1ad41e00d9eaae"},{"fromAddress":"0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","toAddress":"0xe8958c0556a005cc10b3dac4144b2358b28e2aaa"}]
This format looks totally fine and would be the way to go.
In R, what you would have to do since you are probably getting a list of data frame when importing this, is merging all the data frames in the list into a single big data frame (this operation is what we would call a flatMap operation in functional programming). I can give you the R command for this if you need it.
I would recommend collaborating on the R file by also adding it to this repo (together with an example JSON file that can be used for testing), then I can do a PR to make R work with it.
yeah we had several lists showing up in RStudio. so it is good to know that something like this flatmap operation exists, but i think i have the right structure now (first 100 blocks):
{"fromAddress":["0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","0xab59a1ea1ac9af9f77518b9b4ad80942ade35088","0xab59a1ea1ac9af9f77518b9b4ad80942ade35088"],"toAddress":[null,null,"0x9850711951a84ef8a2a31a7868d0dca34b0661ca"]}
actually everything is easier than expected. since jackson automatically converts my ArrayLists into json arrays
This looks like the wrong format now, you are tearing the touples apart by this. Please create a PR from a branch with your current working code, then we can discuss it there and make a code review.
The prior format was correct:
[
{
"fromAddress": "0xab59a1ea1ac9af9f77518b9b4ad80942ade35088",
"toAddress": null
},
{
"fromAddress": "0xab59a1ea1ac9af9f77518b9b4ad80942ade35088",
"toAddress": null
},
{
"fromAddress": "0xab59a1ea1ac9af9f77518b9b4ad80942ade35088",
"toAddress": "0x9850711951a84ef8a2a31a7868d0dca34b0661ca"
}
]
Flattening in R can be done with bind_rows()
function.
i built it according to this tutorial: https://www.tutorialspoint.com/r/r_json_files.htm. worked yesterday. dunno iam extrracting transactions form the first 100 000 blocks atm. when i am done i will try to import the new file into rstudio
@moekappels don't use the format described in the blog post, pre-shaping of data can always be done in data science tools. The proposed format is not a sementically structured and self-describing JSON format.
Use the format as specified in https://github.com/internet-sicherheit/ethereum-cache-creator/issues/3#issuecomment-621079395.
Also, please don't commit directly to master, instead work on a branch and create a PR. I will make master a protected branch in this repo in order to enforce and teach this workflow :wink:
@moekappels This is basically already done and on master
, isn't it? If yes, we can close the issue.
yes but we still focus on the addresses. i am fine with closing so there is less clutter in the issues.
we took a long detour with the json files and ended up where we started.
Alright, we close once we created issues for the outstanding tap mentioned in the top post.
I close for know to clean up our issue list. New issues should be created for specific fields, e.g. like #29.
These are potentially interesting data to get (inspired by Chinese paper about networks):
However, for now it is enough to serialize the transaction data with the 2 address field
to
andfrom
.