Implement tool for saved Keras model file inspection, diff, and patching.

pmasousa commented 4 months ago

Hello! I saw this feature on "🚀 Contributing to Keras 🚀" and I want to know If I can start developing it. The tool can:

Take a fname.keras file and display the manifest of its contents (including weights file structure)
Take a fname.weights.h5 and display the manifest of its content.
Diff two weights files, highlighting what's in one and not in the other.
Patch a file, by replacing a given weight with a different value provided by the user.

fchollet commented 4 months ago

Sure, you are welcome to work on that. Do you have any experience with web development? I'm thinking this tool may benefit from an interactive js/html interface to be used in a notebook.

pmasousa commented 4 months ago

I'm in the third year of computer science and engineering, so we already had to use JS and HTML for some projects, and I already have some knowledge of web development from some side projects I have done and am currently doing. I will also be doing this with @pedro-curto, who is in the same year and university as I am. I also agree that this tool would be better with an interface so I'll be glad to do it if you agree.

fchollet commented 4 months ago

Sounds great.

Here's an example of a draft I wrote a long time ago, displaying a kind of summary of a file's content:

def inspect_file(
    filepath, reference_model=None, custom_objects=None, print_fn=print
):
    filepath = str(filepath)
    if filepath.endswith(".keras"):

        with zipfile.ZipFile(filepath, "r") as zf:
            print_fn(f"Keras model file '{filepath}'")

            with zf.open(_CONFIG_FILENAME, "r") as f:
                config = json.loads(f.read())
                print_fn(
                    f"Model: {config['class_name']} name='{config['config']['name']}'"
                )
            if reference_model is None:
                reference_model = deserialize_keras_object(
                    config, custom_objects=custom_objects
                )

            with zf.open(_METADATA_FILENAME, "r") as f:
                metadata = json.loads(f.read())
                print_fn(f"Saved with Keras {metadata['keras_version']}")
                print_fn(f"Date saved: {metadata['date_saved']}")

            archive = zipfile.ZipFile(filepath, "r")
            weights_store = H5IOStore(
                _VARS_FNAME + ".h5", archive=archive, mode="r"
            )
            print_fn("Weights file:")
            inspect_nested_dict(weights_store.h5_file, print_fn, prefix="    ")

    elif filepath.endswith(".weights.h5"):
        print_fn(f"Keras weights file '{filepath}'")
        weights_store = H5IOStore(
            _VARS_FNAME + ".h5", archive=archive, mode="r"
        )
        inspect_nested_dict(weights_store.h5_file, print_fn)

    else:
        raise ValueError(
            "Invalid filename: expected a `.keras` `.weights.h5` extension. "
            f"Received: filepath={filepath}"
        )

def inspect_nested_dict(store, print_fn=print, prefix=""):
    for key in store.keys():
        value = store[key]

        if hasattr(value, "keys"):
            skip = False
            if (
                list(value.keys()) == ["vars"]
                and len(value["vars"].keys()) == 0
            ):
                skip = True
            if key == "vars" and len(value.keys()) == 0:
                skip = True
            if not skip:
                print_fn(f"{prefix}{key}")
                inspect_nested_dict(value, print_fn, prefix=prefix + "    ")
                if key == "vars":
                    for k in value.keys():
                        w = value[k]
                        print_fn(f"{prefix}    {k}: {w.shape} {w.dtype}")

(It relies on objects from keras/src/saving/saving_lib.py, like H5IOStore).

I think we want the following features:

Show the contents of a file, down to visualizing weight variables as a series of color grids. The interface should make this easy: at first you only see the list of top-level layers, but you can click on any of them to expand their contents, etc. Finally you can click on a weight tensor to visualize it. HTML+JS is a great fit for this, compared to the command line.
Show the diff compared to a reference_model. Highlight any incompatibilities or differences between the saved file contents and the structure of the reference model.
Offer a way to rename a weight or layer, or delete one, or add one -- saving a new edited file as a result. This could be done with an interactive interface.

What do you think?

pmasousa commented 4 months ago

That sounds fantastic! The template already helps a lot. Can I ask you if I have any further questions? Can you add @pedro-curto as a participant to this issue?

fchollet commented 4 months ago

Sure, you can just ask questions in this thread.

pmasousa commented 4 months ago

Hi @fchollet, We would like some feedback on our current data visualization and structure. Here is a collab with what we have so far, regarding the first functionality, along with some test scripts to visualize the output. Is this what you had in mind? Additionally, we're struggling to find a robust way to handle aspect ratios for the graphs so that they work well with all types of dimensions. Do you have any suggestions or best practices for this?

Thank you very much for your assistance.

keras-team / keras

Implement tool for saved Keras model file inspection, diff, and patching. #19705