Different results using similar methods on a model organism (E. coli)

I want to run this tool on ~100k de novo genomes but I want to make sure the I understand the results/interpretation.

I've tested it on E. coli in the following scenarios:

enzymes (mode=-ec) - I created smart table with EcoCyc to get all of the enzymes (included in Archive.zip).
enzymes-from-kegg (mode=-ec) - I ran KofamScan against all of the proteins from the NCBI reference assembly. I used only the KO with enzyme hits. I then converted the KO to EC numbers that are in the KO description.
kegg-enzymes (mode=-ko) - I ran KofamScan against all of the proteins from the NCBI reference assembly. I used only the KO with enzyme hits.
kegg-full (mode=-ko) - I ran KofamScan against all of the proteins from the NCBI reference assembly. I used all KO with hits.

I checked for the number of pathways that passed minpath thresholds:

for fp in glob.glob("/Users/jolespin/Cloud/Informatics/Development/Forks/MinPath/test/e-coli/*/report.txt"):
    id = fp.split("/")[-2]
    data = list()
    with open(fp, "r") as f:
        for line in f:
            line = line.strip()
            if line:
                left, right = line.split("  name  ")
                fields = left.split(" ")
                fields = list(filter(bool, fields))
                reconstruction_available = fields[3]

                row = [
                    fields[1],
                    fields[2],
                    reconstruction_available if reconstruction_available != "n/a" else False,
                    bool(eval(fields[5])),
                    bool(eval(fields[7])),
                    fields[9],
                    fields[11],
                    right,
                ]
                data.append(row)
    df = pd.DataFrame(data, columns=["id_minpath", "database", "reconstruction_available", "naive_reconstructed", "minpath_passed", "number_of_families_in_reference_pathway", "number_of_families_annotated", "name"])
    df = df.set_index("id_minpath")
    print(id, df["minpath_passed"].sum())

I got the following results:

enzymes 388
enzymes-from-kegg 507
kegg-full 105
kegg-enzymes 94

I guess what I'm unclear on is why the number of completed pathways differs by so much using the various methods.

What method would you recommend for de novo genomes?

Archive.zip

mgtools / MinPath

Different results using similar methods on a model organism (E. coli) #9