[FR] Add image artifact to "Comparing 2 Runs" UI

rgaiacs commented 3 years ago

Willingness to contribute

The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?

[ ] Yes. I can contribute this feature independently.
[X] Yes. I would be willing to contribute this feature with guidance from the MLflow community.
[ ] No. I cannot contribute this feature at this time.

Proposal Summary

When working with images, it is useful to compare two image artifact.

mlflow-fr

Motivation

What is the use case for this feature?

Jane is experimenting with a image filter. Jane uses MLflow to apply the filter to different images and saves the output image as MLflow artifact. Jane visit MLflow Tracking UI and have a side by side view of the the images when comparing two filters/experiments.

Sarah is experimenting with some machine learning to identify particles. Sarah is aware of some edge cases that Sarah must be careful with. Sarah record the edge cases as artifacts and uses MLflow Tracking UI to inspect the improve of the machine learning implementation.
Why is this use case valuable to support for MLflow users in general?

This feature will benefit MLflow users that work with images.
Why is this use case valuable to support for your project(s) or organization?

This feature will be time saving for us.
Why is it currently difficult to achieve this use case? (please be as specific as possible about why related MLflow features and components are insufficient)

MLflow "Comparing 2 Runs" Tracking UI doesn't list artifacts.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: Local serving, model deployment tools, spark UDFs
[ ] area/server-infra: MLflow server, JavaScript dev server
[X] area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

[X] area/uiux: Front-end, user experience, JavaScript, plotting
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Languages

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

dmatrix commented 3 years ago

@rgaiacs Thanks for filing this request. Currently, you can compare runs' parameters and metrics, along with schema signatures. Model Registry allows you to compare model versions' MLflow entities. Comparing artifacts is an interesting idea. One can envision comparing two SHAP images, or as you point out comparing images across experiment runs when they are a couple of images that went through a set of filters and max pooling. The problem may arise: which ones do you compare when you have a large batch of images, each undergoing a convulsion? And which ones do you display to compare as there can be hundreds of images?

cc: @sueann @dbczumar @smurching @AveshCSingh

rgaiacs commented 3 years ago

@dmatrix Thanks for the feedback.

Regarding the number of images, I think this feature can initially be limited to a maximum number of artifacts per experiment (let say 10 images for now) to avoid performance issues on both server and client side. When Joe is writing their script, Joe can choose 10 images based on their knowledge.

I believe this feature is useful during the exploration and debugging phase of a project and not practical during the benchmark phase. For example, let say that Kat has 1000 images of cats and 1000 images of dogs to train a machine learning model that recognise cats. Kat does the training of the model and now need to test it. Kat manually selects 10 images edge cases (image with more than two cats, image with one cat and one dog, ...) to be displayed/compared by MLflow. Kat discovers that the model doesn't recognise kids draw of cats and Kat add it to the issue track. When Kat is performing benchmark against competitors models, no image is displayed for comparison.

Lucas-bayati commented 3 years ago

Hi, The ability of comparing artifacts from multiple runs is very interesting. In our team, we need to compare observed-vs-fitted or lift charts from multiple runs. Another example is comparing the feature importance graphs or SHAP results for two runs.

Is there any plan for developing this feature in Mlflow? Thanks.

mohammedayub44 commented 3 years ago

Really interested in this feature too. Our project has a lot of object detection models that run against multiple test sets/edge cases (lets say each test set has 25 images). It would be good to compare these test runs (form one model) side-by-side along with the images. Or even multiple models on same test set along with images to troubleshoot qualitatively faster.

Currently we manually have to do this and it takes several days of back and forth iterations :( .

Thanks !

intelligentaudit commented 3 years ago

Interested in this feature as well to compare PPM/Recall curves of models trained with different data. Thx

R0ll1ngSt0ne commented 2 years ago

Our team is very interested in this feature being enabled in mlflow - is there a plan to get some momentum on its development?

semperigrinus commented 2 years ago

This feature is available on neptune.ai and seems to work quite well there. I don’t see a reason to stop with images either — for example, I use Vega lite to generate interactive plots, and being able to view them side by side would be a big help.

benelot commented 2 years ago

any updates on this? I mean there are so many options to turn something you want to compare into visual data. Thus this contribution would essential since with it you could compare any type of plot artifact, output image or other visual representation of the model's performance. So I am not sure why this is not on the list of very important features.

thomasfarrierGjensidige commented 1 year ago

Very interesting :-)

susanameiras commented 1 year ago

I'd love to have that. I work with time series forecasting and it's always important to look at previous versions' plots to compare between each other. Any developments of this? Thank you!

floriancircly commented 1 year ago

Any updates on this feature?

trianta2 commented 1 year ago

I'm also interested in this feature. One way to go about this is to have a table of images, where rows are aligned using the figure name and columns are runs. The user must select which figures to display, similar to the existing parameters and metrics drop down menus.

jescalada commented 2 days ago

Hi folks,

I've been working on implementing the Artifact View as per the mockup in this issue's Proposal Summary. I've noticed that we have an ArtifactView component that's currently used for displaying artifacts in the RunViewArtifactTab.

However, getting it to work within the CompareRunView component is more complex than I thought. It’s not as straightforward as simply importing the component; it requires managing the active node for the artifact view, fetching the artifact (if it's an image), and ensuring the dependencies flow correctly.

So far, I have added a section for comparing artifacts:

I'm hoping someone can provide some guidance on how to proceed with this implementation. I believe this feature would be valuable both to the community and my team.

Thank you! 😃

Pinging @daniellok-db for insights.

daniellok-db commented 2 days ago

Hi @jescalada, let me take a look and get back to you tomorrow 😃. Thanks so much for working on this!

daniellok-db commented 1 day ago

Oof, it looks like CompareRunView is one of our old React class-based components, which makes things a little more inconvenient.

It would probably be cleanest to make a new component, and use ShowArtifactPage to render the previews. Then you can just insert CompareRunArtifactView into the page. Something like:

export const CompareRunArtifactView({
  runUuids,
}: {
  runUuids: string[]
}) => {
  const [artifactPath, setArtifactPath] = useState(null)

  // define some hook that fetches all the artifact paths for a list of runs
  const { artifactsKeyedByRun } = useRunsArtifacts(runUuids);

  // filter out only the artifact names that all the runs share
  // this can be used to generate the sidebar in the original mockup
  const commonArtifacts = getCommonArtifacts(artifactsKeyedByRun)

  return (
    <div>
      <CompareRunArtifactViewSidebar
        artifacts={commonArtifacts}
        onRowClick={setArtifactPath}
      />
      <div css={{ display: flex, flexDirection: 'row' }}>
        {runUuids.map((runUuid) => (
          <ShowArtifactPage
            runUuid={runUuid}
            artifactRootURI={/* get this from runInfo somewhere */}
            path={artifactPath}
            /* ... other props if necessary */
          />
        ))}
      </div>
    </div>
  )
}

I think you can probably use listArtifactApi to help out with the fetching of the artifact paths. note that if you have nested artifacts (e.g. some artifacts are contained within a folder), then you will need to use the path param to fetch artifacts within that folder.

Hope this helps, and I hope it's enough to unblock you. Let me know if you have questions!

mlflow / mlflow