shazow / whatsabi

Extract the ABI (and resolve proxies, and get other metadata) from Ethereum bytecode, even without source code.
https://shazow.github.io/whatsabi/
MIT License
1.04k stars 71 forks source link

Feature request: enable proxy contract detection via Etherscan API #125

Closed yohamta closed 3 days ago

yohamta commented 4 days ago

The Etherscan getsource API includes a proxy flag and the address of the implementation in its response. By using this data, we can enhance the flexibility of proxy detection.

For instance, in the current release of Whatsabi, the implementation of the USDCv3 Token does not seem to be retrieved. However, the Etherscan API response flags it as a proxy and provides the address of the implementation. By adding the isProxy and implementation fields to the ContractResult structure, we can enhance proxy detection and retrieve the correct implementation addresses.

Proposal:

Modify the ContractResult type to include isProxy and implementation fields. Additionally, update the EtherscanABILoader to populate these fields from the Etherscan API response.

export type ContractResult = {
    abi: any[];
    name: string | null;
    evmVersion: string;
    compilerVersion: string;
    runs: number;

    ok: boolean; // False if no result is found

    /**
     * getSources returns the imports -> source code mapping for the contract, if available.
     *
     * Caveats:
     * - Not all loaders support this, so the property could be undefined.
     * - This call could trigger additional fetch requests, depending on the loader.
     **/
    getSources?: () => Promise<ContractSources>;

    userdoc?: any;
    devdoc?: any;

+    isProxy?: boolean;
+    implementation?: string;
}

We are willing to submit a pull request to implement this feature if that would be acceptable.

shazow commented 3 days ago

For instance, in the current release of Whatsabi, the implementation of the USDCv3 Token does not seem to be retrieved.

HMM, I think this is a bug, we can fix it. Opened an issue here: https://github.com/shazow/whatsabi/issues/126

However, the Etherscan API response flags it as a proxy and provides the address of the implementation. By adding the isProxy and implementation fields to the ContractResult structure, we can enhance proxy detection and retrieve the correct implementation addresses.

I support adding access to these, but I'm hesitant to add a bunch of fields that are specific to a single loader source (including userdoc, devdoc). I was hoping that ContractResult would contain normalized results so that the user doesn't need to worry where it came from.

So let's step back for a moment and consider some alternatives instead of inlining polymorphic attributes { userdoc, devdoc, isProxy, implementation }:

  1. We could include a rawResponse: any object which just includes the underlying response. That way if anything is missing and you know which source it came from, you can always dig through it?
  2. We could turn ContractSource into an interface, and use polymorphic results: Etherscan would return an EtherscanContractResult which implements ContractSource but additionally contain isProxy and implementation. We would do something similar withSourcifyContractResultto includeuserdocanddevdoc`.
  3. We could do embedding (like rawResponse) but only for a specific subset of non-normalized functionality, so something like ContractResult.extra = { "sourcify": { userdoc, devdoc }, "etherscan": { isProxy, implementation }}. This would be more explicit about what's coming from where.

I lean towards 3.

This way things can start in extra and get pulled out to the higher level result object as more APIs implement them or we find ways to normalize these results.

What do you think?

(Particularly keep in mind that we want to add more sources of verified contracts, like thirdweb with https://github.com/shazow/whatsabi/issues/98)

yohamta commented 3 days ago

@shazow Thank you for the thorough response! I understand the concern about maintaining a normalized structure for ContractResult. I agree that adding fields specific to a single source could complicate things. Your third suggestion, embedding the extra data in a dedicated field, seems like a good compromise. I’m happy to go with that approach.

That said, having a rawResponse would also be helpful. It would allow users to handle potential future issues (e.g., missing field values) with more flexibility on their side.

Would you like us to move forward with a pull request based on this idea?

shazow commented 3 days ago

@yohamta I can make that change. :)

I'm happy to add rawResponse, my only argument against it is that it would be easier to get feedback from consumers of which fields they need. I worry if we add rawResponse, I'll never hear from anyone. 😅

Not a great reason, though.

I'm hesitant to add both rawResponse and extra. What do you think? Should I just do rawResponse?

yohamta commented 3 days ago

@shazow Thanks for your quick response! I understand your concern about getting feedback from consumers. That’s a valid point, but I still think that adding rawResponse is the more flexible solution. It allows users to access all the data they might need, even if certain fields aren’t immediately part of the normalized structure.

Since rawResponse can handle the flexibility we’re looking for, I think sticking with that alone makes the most sense. Let’s go ahead with just rawResponse. What do you think?

shazow commented 3 days ago

@yohamta Sounds like a plan. I'll do that and will ping you with a PR when it's ready. :)

shazow commented 3 days ago

Opened PR here: https://github.com/shazow/whatsabi/pull/127

Also added a test which contains your example:

https://github.com/shazow/whatsabi/blob/d6b179d11ad15bec2438374e1237be7f2fc8f0b5/src/__tests__/loaders.test.ts#L118-L124