Closed eduohe closed 3 months ago
Should we just query BigQuery instead and stop using the files in S3?
Hi @gabehamilton , I think @pkudinov mentioned that we will need to use file based approaches for QueryAPI data/metadata.
We don't want to have a dependency on BigQuery for that – it's actually going to be quite expensive. Ultimately we can have an indexer built in QueryAPI to store this data. @eduohe , this might be another indexer you can prioritize building. This way QueryAPI will be self-sufficient.
Here is a sample indexer that does the job. It should run on *
filter though and currently breaks V2 runner.
import { Block } from "@near-lake/primitives";
async function getBlock(block: Block) {
const createBitmap = function (bitsIndexes, bitmap) {
const bytes = bitsIndexes.reduce(
(res, b) => {
const newBit = { byte: Math.floor(b / 8), bit: b % 8 };
const maxByte = res.maxByte < newBit.byte ? newBit.byte : res.maxByte;
return {
maxByte,
bits: [...res.bits, newBit],
};
},
{ bits: [], maxByte: 0 }
);
const buffer = bitmap
? [...bitmap, ...new Uint8Array(bytes.maxByte + 1 - bitmap.length)]
: new Uint8Array(bytes.maxByte + 1);
const result = bytes.bits.reduce((result, b) => {
result[b.byte] = result[b.byte] | (1 << b.bit);
return result;
}, buffer);
//console.log(result.btoa());
return Buffer.from(result).toString("base64");
};
const decodeBitmap = function (bitmapBase64) {
const bytes = Uint8Array.from(Buffer.from(bitmapBase64, "base64"));
return bytes;
};
const bitmapToString = function (bitmapBase64) {
return decodeBitmap(bitmapBase64).reduce(
(r, b) => r + b.toString(2).padStart(8, "0"),
""
);
};
const blockDate = new Date(
new Date(block.streamerMessage.block.header.timestamp / 1000000)
.toISOString()
.substring(0, 10)
);
const actionsByReceiver = block.actions().reduce((groups, action) => {
(groups[action.receiverId] ||= []).push(action);
return groups;
}, {});
const allReceivers = Object.keys(actionsByReceiver);
console.log(`There are ${allReceivers.length} receivers in this block.`);
await Promise.all(
allReceivers.map(async (receiverId) => {
const currentIndex = await context.db.ActionsIndex.select({
block_date: blockDate,
receiver_id: receiverId,
});
if (currentIndex && currentIndex[0]) {
const blockDiff =
block.blockHeight - currentIndex[0].first_block_height;
const newBitmap = createBitmap(
[blockDiff],
decodeBitmap(currentIndex[0].bitmap)
);
return context.db.ActionsIndex.update(
{ block_date: new Date(blockDate), receiver_id: receiverId },
{ block_date: blockDate, receiver_id: receiverId, bitmap: newBitmap }
);
} else {
return context.db.ActionsIndex.insert({
first_block_height: block.blockHeight,
block_date: blockDate,
receiver_id: receiverId,
bitmap: createBitmap([0], []),
});
}
})
);
}
The indexer is working in development. Next steps:
fyi, we've been checking the dataplatform indexers needed for near.org into https://github.com/near/near-discovery-components/tree/develop/indexers There's no autodeployment yet, it's just a reference. Perhaps the bitmap indexer would fit there too.
https://near.org/dev-queryapi.dataplatform.near/widget/QueryApi.App?selectedIndexerPath=nearpavel.near/bitmap_v2