naddison36 / sol2uml

Solidity contract visualisation tool
MIT License
1.12k stars 257 forks source link

Fix: Improve flatten compilations to fail less on solc error[2333] and error [2449] #127

Open plotchy opened 1 year ago

plotchy commented 1 year ago

I've been relying on flatten for some daily work auditing contracts, and I wanted to share a few errors I run into and a few workarounds Ive created to help get past them. Often times, both occur for similar reasons, but there are separate ways to fix these. I want to preface that I know absolutely 0 javascript, so please forgive the horrendous code workarounds...

Error 2449

error[2449]: TypeError: Definition of base has to precede definition of derived contract
  --> sFTMxOracle.sol:17:25:
   |
17 | contract sFTMxOracle is ProviderAwareOracle {
   |                         ^^^^^^^^^^^^^^^^^^^

This occurs due to one filename from etherscan containing multiple instances of contracts/interfaces/etc. For example, a minimalistic interface defined at the top of a file and a massively inheriting contract below it.

When the uml classes are topologically sorted, the minimal interface is at the front, and the massive contract at the back. This causes an error when umlclasses are mapped to their relative filenames, and then de-duplicated with a Set(). This causes the first instance to act as a hoist for the rest of the declared contracts in that file.

The entire solidity code for the filename is concatenated for printout, and the massive file causes an error due to preceding definition.

The Fix

Still de-duplicate using a Set(), but do it in reverse order, so that the last instance of a filename in the sort is when the file gets printed. In other words, have the massive contract drag the minimal interfaces back.

// within parserEtherscan.js
class EtherscanParser {
    async getSolidityCode(contractAddress) {
        // ....
        // All duplicates in set need to be removed, but ordering needs to keep the last instance of the duplicate
        // so reverse the list, remove duplicates with the Set(), then reverse it back
        const dependentFilenames = [...new Set(sortedFilenames.reverse())].reverse();

Can reproduce on this contract: https://etherscan.io/address/0x91ee5184763d0b80f8dfdCbdE762b5D13ad295f4#code sol2uml flatten 0x91ee5184763d0b80f8dfdCbdE762b5D13ad295f4

Error 2333

error[2333]: DeclarationError: Identifier already declared.
    --> YearnWrapper.sol:1464:1:
     |
1464 | interface IERC20 {
     | ^ (Relevant source part starts here and spans across multiple lines).
Note: The previous declaration is here:
  --> YearnWrapper.sol:34:1:
   |
34 | interface IERC20 {
   | ^ (Relevant source part starts here and spans across multiple lines).

Etherscan allows multiple instances of the same contract/interface in the code list, so in this case it occurs due to IERC20 repeating twice in the umlclasses list.

It is quite annoying to remove... Here are my thoughts:

Here is my attempt at detecting supersets and modifying the topological sort:

async getSolidityCode(contractAddress) {
        const { files, contractName, compilerVersion } = await this.getSourceCode(contractAddress);
        // Parse the UmlClasses from the Solidity code in each file
        let umlClasses = [];
        for (const file of files) {
            const node = await this.parseSourceCode(file.code);
            const umlClass = (0, converterAST2Classes_1.convertAST2UmlClasses)(node, file.filename);
            umlClasses = umlClasses.concat(umlClass);
        }
        const topologicalSortedClasses = (0, filterClasses_1.topologicalSortClasses)(umlClasses);

        // Sort the classes so dependent code is first
        // Review umlClasses and determine if any umlClass.name is a duplicate
        // If so, use the umlClass.operators[..].name field (functions of class) to determine if one is a superset of the other,
        // and move the superset to the lower index of the two compared classes and remove the other
        function removeDuplicateClasses(array, contractName, contractAddress) {
            const non_overlapping_functions = [];
            const duplicate_functions=[];

            // first, find all the duplicates. Push them as a list of objects to duplicates
            for (const obj of array) {
                const duplicates = array.filter(o => o.name === obj.name && o.relativePath !== obj.relativePath && duplicate_functions.flat().includes(o) === false);
                if (duplicates.length > 0) {
                    duplicate_functions.push([obj,...duplicates]);
                }
            }

            // second, find all the non-overlapping functions of these duplicate_functions
            for (const dupe_pairing of duplicate_functions) {
                // each dupe_pairing should be treated as its own instance of a supersetting issue
                const fn_names = dupe_pairing.map((dupe_incident) => {
                    return dupe_incident.operators.map((operator) => {
                        const params = operator.parameters.map((param) => {
                            return param.type;
                        });
                        const fn_name = operator.name.concat(params);
                        return fn_name;
                    });
                });
                // take longest list within fn_names and assume it is the superset
                const longest_list = fn_names.reduce((a, b) => a.length > b.length ? a : b);
                const longest_class = dupe_pairing.reduce((a, b) => a.operators.length > b.operators.length ? a : b);
                // check if longest_list contains all the other lists
                const dupe_pairing_hasSuperset = fn_names.every((list) => {
                    for (const item of list) {
                        if (longest_list.includes(item) === false) {
                            non_overlapping_functions.push(item);
                            return false;
                        }
                    }
                    return true;
                });
                if (dupe_pairing_hasSuperset) {
                    // move the superset to the lower index of the two compared classes. Other will be removed through our Set() at the end
                    let indexes_of_duplicates = [];
                    for (const o of dupe_pairing) {
                        indexes_of_duplicates.push(array.indexOf(o));
                    }
                    const min_index = Math.min(...indexes_of_duplicates);
                    array[min_index] = longest_class;
                    for (const index of indexes_of_duplicates) {
                        if (index !== min_index && index > -1) {
                            array.splice(index, 1); // remove one item only
                        }
                    }
                }
            }

            // third, if there are any non_overlapping_functions, throw an error
            if (non_overlapping_functions.length > 0) {
                throw Error(`Failed to find superset regarding ${contractName}, ${contractAddress}, and these functions:\n${non_overlapping_functions}`);
            }

            // lastly, remove duplicates from array in order
            const new_array = [...new Set(array)]
            return new_array;
        }
        for (const umlClass of topologicalSortedClasses) {
            console.log(`${umlClass.relativePath}`);
        }
        const filteredClasses = removeDuplicateClasses(topologicalSortedClasses, contractName, contractAddress);
        for (const umlClass of filteredClasses) {
            console.log(`${umlClass.relativePath}`);
        }

        const sortedFilenames = filteredClasses.map((umlClass) => umlClass.relativePath);
        //....

Original ordering (notice the double occurence of IERC20):

@openzeppelin/contracts/utils/Context.sol
@openzeppelin/contracts/token/ERC20/IERC20.sol   # dupe
@openzeppelin/contracts/token/ERC20/extensions/IERC20Metadata.sol
@openzeppelin/contracts/token/ERC20/ERC20.sol
contracts/interfaces/adapters/yearn/IVaultWrapper.sol
contracts/interfaces/IERC4626.sol
@openzeppelin/contracts/access/Ownable.sol
@openzeppelin/contracts/security/ReentrancyGuard.sol
contracts/lib/FixedPointMathLib.sol
@openzeppelin/contracts/utils/Address.sol
@openzeppelin/contracts/token/ERC20/utils/SafeERC20.sol
contracts/interfaces/adapters/yearn/VaultAPI.sol
contracts/interfaces/IERC20.sol                      # dupe
contracts/interfaces/adapters/yearn/VaultAPI.sol
contracts/interfaces/adapters/yearn/VaultAPI.sol
@openzeppelin/contracts/utils/math/Math.sol
contracts/adapters/yearn/YearnWrapperV2.sol

New ordering:

@openzeppelin/contracts/utils/Context.sol
@openzeppelin/contracts/token/ERC20/IERC20.sol  # superset!
@openzeppelin/contracts/token/ERC20/extensions/IERC20Metadata.sol
@openzeppelin/contracts/token/ERC20/ERC20.sol
contracts/interfaces/adapters/yearn/IVaultWrapper.sol
contracts/interfaces/IERC4626.sol
@openzeppelin/contracts/access/Ownable.sol
@openzeppelin/contracts/security/ReentrancyGuard.sol
contracts/lib/FixedPointMathLib.sol
@openzeppelin/contracts/utils/Address.sol
@openzeppelin/contracts/token/ERC20/utils/SafeERC20.sol
contracts/interfaces/adapters/yearn/VaultAPI.sol
contracts/interfaces/adapters/yearn/VaultAPI.sol
contracts/interfaces/adapters/yearn/VaultAPI.sol
@openzeppelin/contracts/utils/math/Math.sol
contracts/adapters/yearn/YearnWrapperV2.sol

Can reproduce on this contract: https://etherscan.io/address/0x91ee5184763d0b80f8dfdCbdE762b5D13ad295f4#code sol2uml flatten 0x91ee5184763d0b80f8dfdCbdE762b5D13ad295f4

plotchy commented 1 year ago

I've spent some more time debugging both of these occurences. Without the full detachment of associations<->filenames (topological sorting occurs on associations, but printing occuring as filenames) there are many edge cases that can exist. This is an issue inherant to flattening. Ive managed to piecemeal some more improvements to reduce the errors from occuring on ~30% of my contracts to about ~5%, but I dont believe there are worthy routes to fix the remainders.

Having the directedAcyclicGraph load edges oppositely (target->source) allows the non-associated classes to be at the front of the sort after the reversing step, this helps majorly with declaration precedence. Also stereotypes 5 and 6 are contract level structs & enums, which don't need to be sorted in the graph as they always can exist within their contracts normally, no need to sort them higher or lower. Having these stereotypes detached often times lead to inheriting contracts getting hoisted to front of the sort due to a struct they declared having no associations during the filename de-duping step.

const loadDirectedAcyclicGraph = (umlClasses) => {
    const directedAcyclicGraph = new js_graph_algorithms_1.DiGraph(umlClass_1.UmlClass.idCounter); // the number vertices in the graph
    for (const sourceUmlClass of umlClasses) {
        for (const association of Object.values(sourceUmlClass.associations)) {
            // Find the first UML Class that matches the target class name
            const targetUmlClass = (0, associations_1.findAssociatedClass)(association, sourceUmlClass, umlClasses);
            if (!targetUmlClass || targetUmlClass.stereotype === 6 || targetUmlClass.stereotype === 5) { // stereotype 6 is contract-level-enum, 5 is contract-level-struct
                continue;
            }
            // console.log(`Adding edge from ${sourceUmlClass.name} with id ${sourceUmlClass.id} to ${targetUmlClass.name} with id ${targetUmlClass.id} and type ${targetUmlClass.stereotype}`);
            // directedAcyclicGraph.addEdge(sourceUmlClass.id, targetUmlClass.id);
            directedAcyclicGraph.addEdge(targetUmlClass.id, sourceUmlClass.id);
        }
    }
    // console.log(directedAcyclicGraph)
    return directedAcyclicGraph;
};
plotchy commented 1 year ago

And front loading all non-dependant nodes to the front of the list before a de-duping step helps to alleviate some edge cases where the graph is 'lazily sorted' (a minimal interface can be second to last if the last node is a contract that inherits from it).

const topologicalSortClasses = (umlClasses) => {
    const directedAcyclicGraph = loadDirectedAcyclicGraph(umlClasses);
    const topologicalSort = new js_graph_algorithms_1.TopologicalSort(directedAcyclicGraph);
    // Topological sort the class ids
    // console.log(topologicalSort.order())
    // const sortedUmlClassIds = topologicalSort.order().reverse();

    let sortedUmlClassIds = topologicalSort.order();
    // TODO need to make it so that nodes that are not included within the adjacency list are propelled to the top of the list
    // seems like this list is lazily sorted. ex: a minimal interface can be second to last if the last node is a contract that inherits from it
    // over range of 0..topologicalSort.V, push nodes into inclusion list
    // for any number not in the inclusion list, push it to the front of the list
    // console.log(topologicalSort)
    // console.log(directedAcyclicGraph.V)
    let not_included_nums = [];
    const flattened_adj_list = directedAcyclicGraph.adjList.flat();
    // console.log(flattened_adj_list)
    for (let i=0; i < directedAcyclicGraph.V; i++) {
        flattened_adj_list.includes(i) ? null : not_included_nums.push(i);
    }
    // console.log(not_included_nums.reverse()) // shouldn't need to reverse this, but seems like it preserves the order of the original list better

    // console.log(sortedUmlClassIds)
    sortedUmlClassIds = not_included_nums.concat(sortedUmlClassIds);
    // console.log(sortedUmlClassIds)
    sortedUmlClassIds = [...new Set(sortedUmlClassIds)]; // remove duplicates
    // console.log(sortedUmlClassIds)
    const sortedUmlClasses = sortedUmlClassIds.map((umlClassId) => 
        // Lookup the UmlClass for each class id
        umlClasses.find((umlClass) => umlClass.id === umlClassId
    ));
    // Filter out any unfound classes. This happens when diff sources the second contract.
    return sortedUmlClasses.filter((umlClass) => umlClass !== undefined);
};