ubiquity / .github

3 stars 8 forks source link

Backfill DB with legacy permits #109

Open Keyrxng opened 1 month ago

Keyrxng commented 1 month ago

Required by https://github.com/ubiquity/work.ubq.fi/pull/46

Using the CLI as-is or with required changes:

a) Collect permit data across ubiquity and ubiquibot orgs b) Insert this data in the permits table (idempotently)

original comment https://github.com/ubiquity/work.ubq.fi/pull/46#issuecomment-2115014677

time estimate: < 4 hrs
Keyrxng commented 1 month ago

I was able to collect about 500 confirmed txhash using data from Dune Analytics yesterday.

Off the top of my head the CLI pulls about double that in permits. Some permits in the DB rn don't have a txHash, that's either because they haven't been claimed or it's acceptable for claimed permits to not have a txHash.

Not all have a location_id either, I plan to just use whatever location_id the user entry has.


A permit being generated and it being claimed can have an arbitrary length of time between so trying to match block_time with comment.timestamp doesn't work very well.

Still an angle or two I haven't tried yet but for sake of reference, for legacy permits where I cannot find a txHash and isNonceClaimed() === true should that be regarded as an invalidatedPermit and any where isNonceClaimed() === false without a txHash should be regarded as yet to be claimed?

rndquu commented 1 month ago

I was able to collect about 500 confirmed txhash using data from Dune Analytics yesterday.

Permits have been generated from different signers at different moments in time so using only on-chain data might not be too accurate. It makes sense to utilize github API for fetching permits.

or it's acceptable for claimed permits to not have a txHash

I don't think claimed permits should miss tx hash.

Not all have a location_id either, I plan to just use whatever location_id the user entry has.

You may leave permits.location_id field empty since the locations table is deprecated and will be removed.

A permit being generated and it being claimed can have an arbitrary length of time between so trying to match block_time with comment.timestamp doesn't work very well.

Github API might have this info.

Still an angle or two I haven't tried yet but for sake of reference, for legacy permits where I cannot find a txHash and isNonceClaimed() === true should that be regarded as an invalidatedPermit and any where isNonceClaimed() === false without a txHash should be regarded as yet to be claimed?

I think most of them should have a txHash. Try fetching with github API.

Keyrxng commented 1 month ago

Permits have been generated from different signers

As far as I can tell with the CLI output it has been four addresses and I used all four parsing on-chain.

Try fetching with github API

I assume you mean the comment metadata here yeah? The CLI collects permits via the claim_url in comments since the metadata is relatively new.

ty for the clarification on the rest

rndquu commented 1 month ago

By the CLI you mean https://github.com/ubiquity/.github/tree/main/airdrop-cli, right? How does it work exactly? Does it parse only on-chain transactions from permit signer or uses github API to fetch all permits info from the claim_url query param?

Keyrxng commented 1 month ago

It parses comments from the bot and pav extracting the permit data using the claim_url, it only verifies if a permit has been claimed via isNonceClaimed.

It uses four different regex to capture the various formats that have existed, and likely needs updated again to match the current format.

I collected data from Dune manually using the four payment addresses to the 60-odd hunters in the DB, using the txHash I unbundled the permit data and was able to pair those up, those come with timestamp but it is hard to pin it against it's comment like that because of time-til-claim, using nonce etc is easier.

The tally from Dune alone is far lower than what it is expected to be.


You can only trace back 50k (might be 5k off the top of my head) blocks at a time using a provider and scripting for each hunter which wasn't really feasible so Dune was the better option.

I can't think how you'd get the txHash using just the permit data / permit comment other than scanning 5/50k blocks at a time from the comment timestamp until that amount is transferred into the user wallet then unbundle the permit and track by nonce, amount.

Any suggestions? It's doable anyway, I just needed those points clarified

rndquu commented 1 month ago

It parses comments from the bot and pav extracting the permit data using the claim_url, it only verifies if a permit has been claimed via isNonceClaimed.

It uses four different regex to capture the various formats that have existed, and likely needs updated again to match the current format.

I collected data from Dune manually using the four payment addresses to the 60-odd hunters in the DB, using the txHash I unbundled the permit data and was able to pair those up, those come with timestamp but it is hard to pin it against it's comment like that because of time-til-claim, using nonce etc is easier.

The tally from Dune alone is far lower than what it is expected to be.

You can only trace back 50k (might be 5k off the top of my head) blocks at a time using a provider and scripting for each hunter which wasn't really feasible so Dune was the better option.

I can't think how you'd get the txHash using just the permit data / permit comment other than scanning 5/50k blocks at a time from the comment timestamp until that amount is transferred into the user wallet then unbundle the permit and track by nonce, amount.

Any suggestions? It's doable anyway, I just needed those points clarified

We had a similar task of matching github issues with on-chain transactions at https://github.com/ubiquity/audit.ubq.fi and used etherscan API, although I don't remember a particular API method that was used (this one or this one or some other)

Keyrxng commented 1 month ago

I wasn't aware of the issue or that etherscan had those endpoints actually, I checked and gnosisscan has them also. They don't seem to be limited by the same block limit either and use they use same method, awesome.

Keyrxng commented 1 month ago

I've been working on this and will have an open PR ready tonight or tomorrow

Keyrxng commented 3 weeks ago

I've had to pretty much refactor the CLI entirely for the following reasons:


I spent far more hours than estimated. I wanted to be confident that the data was as verifiable as I could make it and not push a half-arsed approach.