elmokhtar-mohamedmoussa-idnow commented 2 days ago

Hello, I open this issue following our discussion with @maniospas here : https://github.com/mammoth-eu/mammoth-commons/issues/23#issuecomment-2431819705

Right now the only possible choice for analysis method (step 3 of demonstrator) for Image Pairs data loader is facex, it will be very useful to include FairBench also. Lets discus the possible solutions here.

Best, Elmokhtar.

maniospas commented 2 days ago

Hi from me too. Tagging @georgiosn (also tag other people from EXUS if it is appropriate to add them to this discussion) because how to best support this is a decision to be made by the main toolkit - the demonstrator is just a lightweight emulator and is very flexible to adjust. Note that we need to make this decision as soon as possible (best by tomorrow evening so that I have time to add the solutions to the demonstrator).

In the interim, I will provide one module for Option B so that we are able to show things in our next workshop (Option A is just a matter of toggling the respective behavior and can be enabled whenever, but i think it breaks the toolkit and will enable it only after a consensus is reached).

The main question is how to implement "metric" modules so that they run for multiple model loaders and datasets. All options on how to proceed basically hinge on two alternatives:

Solution A

In my opinion, the proper way to support multiple modules from an API design perspective would be to allow and check for inheritance between model and dataset types, so that the toolkit and demonstrator can check for compliance.

Take, for example the current FairBench model card declaration:

@metric(namespace="maniospas", version="v008", python="3.11", packages=("fairbench",))
def model_card(
    dataset: CSV,
    model: ONNX,
    sensitive: List[str],
    intersectional: bool = False,
    compare_groups: Options("Pairwise", "To the total population") = None,
) -> Markdown:
  ....

One change would consist of modifying the ONNX typehint to a Model typehint. Literally just changing one word and let Python's duck typing handle the rest. Of course, the tools need to check for inheritance. Doing this for the demonstrator is trivial (and already integrated) but in a bilateral call I was told that it is harder for the toolkit.

If it is an issue of performing the checks between component compatibility I can help. (This should not be a bottleneck at all given how much control Python's inspect module and the isinstance method provide.)

Solution B

An alternative that does not require any development from the part of the toolkit would be to create a separate module for each usage scenario. This might sound benign, but let us remember that we have multiple dataset types and model types, which means for example that we would already need at least 10 modules only for implementing what is currently one model card module, even without accounting for the pending integration of other fairness frameworks.

This is not a hell I am comfortable venturing into. We can theoretically make commons manage the logistical bloat by generatiing all the module definitions programmatically (since only the decorator changes), which is not an issue. However, I am not sure that creating perhaps hundreds of images is a realistic strategy, especially since each of those is 500MB or larger in size.

One option here would be to pack several modules in the same image if the toolkit can support that. In this case, we could have a selector variable to choose what is being executed from inside the image.

maniospas commented 2 days ago

After thinking about it even more, we also have a third option:

Solution C

Programmatically emulate inheritance. This would mean that the toolkit always sees one model type and one dataset type and we perform the programmatic switch internally within that type. This violates every code safety and maintainability principle I can think of, not to mention that any compatibility check from the side of the toolkit would be equally hard to implement as Solution A.

We can abstract away the violations so that they don't impact the quality of module development. They will, however, make commons even harder to debug in the future; I am not personally comfortable incurring so much technical debt. In fact existing design concessions in integration.py are already excessive and -once we reach a desireable state in the quantity of available material- the should be work towards simplifying our code instead of further complicating it.

georgiosn commented 1 day ago

Hello! Let me explain how the toolkit understands the connections now, so we can find a solution.

The toolkit doesn't have access to the source code of the modules. The code is in the Docker images that will be downloaded and run by KFP (Kubeflow Pipelines). The toolkit determines module relationships through the meta.yaml files. Therefore, if a metric module expects an ONNX model and a CSV dataset, these modules will only appear if they were previously selected.

If instead of using ONNX we use MODEL or any generic word that signifies multiple models (e.g., ONNX and something else), the toolkit will allow it as it is based on the meta.yaml files. However, the question is: what will happen in KFP? Will the metric module be able to handle either ONNX or other model outputs?

The module should be able to understand the output and handle it accordingly. If it cannot, the pipeline will fail.

Does this help further in determining a solution? Based on what I read, Solution A may be feasible, but testing is needed based also on the discussion we had with Manios.

I will come back with the test result as we discussed.

elmokhtar-mohamedmoussa-idnow commented 1 day ago

Hello Manios and Georgio, thank you for your help. I thinks its very good news that the solution A is feasible. @maniospas for the planary, do you think its reasonable to implement this solution (since time is short), at least for the demonstrator, in order to have fairbench running for UC2 also ?

georgiosn commented 1 day ago

Hello again Manios and Elmokhtar,

I write here the changes made, for the test as discussed with Manios, and the result.

Interactive_sklearn_report function changed to the folowing, dataset: from CSV became Dataset

@metric(
    namespace="mammotheu",
    version="v0024",
    python="3.11",
    packages=(
        "fairbench",
        "scikit-learn",
    ),
)
def interactive_sklearn_report(
    dataset: Dataset,
    model: EmptyModel,
    sensitive: List[str],
    predictor: Options("Logistic regression", "Gaussian naive Bayes") = None,
    intersectional: bool = False,
    compare_groups: Options("Pairwise", "To the total population") = None,
) -> HTML:

Interactive_sklearn_report_meta yaml that generated:

id: interactive_sklearn_report
name: interactive sklearn report
description: "Creates an interactive report using the FairBench library, after running\
  \ an internal training-test split\n    on a basic sklearn model. The report creates\
  \ traceable evaluations that you can shift through to find sources\n    of unfairness\
  \ on a common task.\n\n    Args:\n        predictor: Which sklearn predictor should\
  \ be used.\n        intersectional: Whether to consider all non-empty group intersections\
  \ during analysis. This does nothing if there is only one sensitive attribute.\n\
  \        compare_groups: Whether to compare groups pairwise, or each group to the\
  \ whole population.\n    \n    Options:\n        predictor: Logistic regression,\
  \ Gaussian naive Bayes\n        compare_groups: Pairwise, To the total population"
parameter_info: Some parameters are needed.
component_type: METRIC
input_types:
- Dataset
- EmptyModel
parameter_default:
  predictor: None
  intersectional: false
  compare_groups: None
output_types: []

Then the data_auto_csv_meta.yaml changed also to output Dataset instead of CSV so it can match with the input of the metric module. Only the meta file edited, no changes to source code.

id: data_auto_csv
name: data auto csv
description: "Loads a CSV file that contains numeric, categorical, and predictive\
  \ data columns.\n    Automatic detection methods for the delimiter and column types\
  \ are applied.\n    The last categorical column is considered the dataset label.\
  \ To load the file using\n    different options (e.g., a subset of columns, different\
  \ label column) use the\n    custom csv loader instead.\n\n    Args:\n        path:\
  \ The local file path or a web URL of the file.\n    "
parameter_info: Some parameters are needed.
component_type: LOADER_DATA
parameter_default:
  path: ''
input_types: []
output_types:
- Dataset

The intercative_sklearnd_module rebuilt with the changed code. In the toolkit the components no_model, auto_csv, interactive_sklearn_report selected the pipeline created and executed on kfp!

So solution A seems viable!

mammoth_sk_learn

maniospas commented 1 day ago

Hello! This is excellent news!

I will create a new demonstrator version with solution A while leaving the deployment in pypi one version behind so that the toolkit does not break. For the time being, leave this issue open so that we properly revisit it once the toolkit is ready for the changes.

maniospas commented 1 day ago

By the way, made the changes already. The demonstrator can now run all FairBench-based modules for the Image datasets, but we need to also implement it for the image pairs. Opened #28 to track progress for that.

mammoth-eu / mammoth-commons

demontrator/app.py chosing FairBench as analysis method for Image Pairs dataloader #25

Solution A

Solution B

Solution C