pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
92 stars 36 forks source link

eXplaianble tool for scientists - Deep Insight And Neural Networks Analysis (DIANNA) #48

Closed elboyran closed 2 years ago

elboyran commented 2 years ago

Submitting Author: Elena Ranguelova (@elboyran)
Package Name: dianna One-Line Description of Package: an eXplainable AI (XAI) python library targeted at scientists Repository Link (if existing): https://github.com/dianna-ai/dianna


Description

Modern scientific challenges are often tackled with (Deep) Neural Networks (DNN). Despite their high predictive accuracy, DNNs lack inherent explainability. Many DNN users, especially scientists, do not harvest DNNs power because of lack of trust and understanding of their working.

Meanwhile, the eXplainable AI (XAI) methods offer some post-hoc interpretability and insight into the DNN reasoning. This is done by quantifying the relevance of individual features (image pixels, words in text, etc.) with respect to the prediction. These "relevance heatmaps" indicate how the network has reached its decision directly in the input modality (images, text, speech etc.) of the data.

There are many Open Source Software (OSS) implementations of these methods, alas, supporting a single DNN format and the libraries are known mostly by the AI experts. The DIANNA library supports the best XAI methods in the context of scientific usage providing their OSS implementation based on the ONNX standard and demonstrations on benchmark datasets. Representing visually the captured knowledge by the AI system can become a source of (scientific) insights.

It is work in progress, for now DIANNA supports part of chosen by objective criteria AI explainability methods such as RISE, LIME
and DeepLIFT SHAP (under development) for ONNX models, which means it's AI framework agnostic.

Scope

A possible category could be Data visualization, but the actual one is missing from the list. It would fall under the broad one of Data Analytics and more precisely eXplainable AI.

The question is does it fit in the scope of pyOpenSci?

P.S. *Have feedback/comments about our review process? Leave a comment here

elboyran commented 2 years ago

Hello, I was wondering if this pre-submission form has been noticed? For me, it is important to find out if my software fits the scope of pyOpenSci.

NickleDave commented 2 years ago

Hi @elboyran and welcome to the pyOpenSci community.
Thank you for providing a detailed pre-inquiry submission and stating a very clear question.

@lwasser and I have discussed whether the package is in scope. (Sorry for not getting back to you sooner -- we just had a holiday here in the US. Which shows you that we need a better distribution of editors 🙂)

The short answer is: yes, we think DIANNA could be in scope, but we do have a couple of questions.

We are still learning about how we can best support the scientific community using Python. It's clear from your inquiry and others like it that there's a need for tooling that supports researchers using deep learning methods. However, these kind of tools are not yet as common in the rOpenSci space, and we have modeled ourselves on that community. So we are figuring this out as we go.

Here are the areas of scope where we think the goals of DIANNA fit into pyOpenSci:

I'll explain each.

data visualization

This one you identified and is fairly self-evident, although reasonable people could disagree about whether XAI methods are really visualizing data per se. But no one can deny that scientists in particular rely on visualizations to validate and tune algorithms they apply to data. So I think I would call this one "clearly in scope".

reproducibility

As you state, one of your goals is to replicate existing methods, with the overall goal of assisting domain scientists. The additional goal of achieving this through the ONXX standard would also increase reproducibility / replicability.

There is precedent for this being in scope: we previously provided review for pystiche which similarly encapsulates deep learning-based style transfer methods with the goal of standardizing research in that area.

https://github.com/pyOpenSci/software-review/issues/25

data extraction

I think meeting the ONXX standard could fall under this heading; you would make it possible for researchers to use XAI methods in a framework-agnostic manner, if I am understanding correctly. So DIANNA would allow researchers to "extract data" (trained weights) from models.

So, with that said, here are our questions:

So overall we do feel that DIANNA could be in scope and we would be very interested in providing review. We also definitely want to connect with your organization since we clearly have aligned goals.

As of right now, though, I think we would need to see the requirement met that the package is near a "maturing" state, before initiating a review. Unless @lwasser feels very strongly otherwise.

Maybe it would be best to contact us again when you feel you are nearing that state? Happy to discuss further here though.

elboyran commented 2 years ago

Dear David (@NickleDave) and @lwasser,

Thank you very much for your time and detailed feedback.

First the questions covered by short answers:

Data visualization and in a way reproducibility are matching categories. I am not sure if the Data extraction is a good category and maybe you would agree with me after my answers to your questions.

  1. Using single standardized ML model format- ONNX. All (including the big) libraries support mostly a single format- pytorch, TF etc. Our scientists should be free of the worry to be finding a library (not all libraries implement “all” XAI methods) supporting their format, let alone that they could possess many different format models for their research. In additon, ONNX has been created not only for interoperability in mind, but also performance. If people use ONNX models, they will benefit from that and will need an explainable library.
  2. It seems a bit arbitrary which models are implemented/supported by each existing library. We base our choice on objective evaluation criteria. We also try to have “complementary” (e.g. with and without access to model architecture) XAI methods included.
  3. DIANNA is not only software. It comes with benchmark datasets which we have designed in attempt to fight the “human seems-to-make-sense” kind of bias. For me, conclusions in publications summarizing as “quantitatively our method works fine as it highlights the object’s edges” on big image datasets is not convincing at all if my scientific data (e.g.. satellite image of an Indian city possibly containing a slum area I want to find out) looks nothing like ImageNet. That’s why we think experimenting with simple (in complexity terms) datasets can make scientist understand and trust the XAI on intuitive level in order to trust the AI. These benchmarks could also be useful for studying the properties of XAI methods by their creators. Easy interface, plenty of documentation, tutorial etc. should make it easier for the (non-technical) users too.

Some of the work explanations above have happened in a sibling repo (we are considering in merging them), but all motivation and plan is described in my initial funding proposal, which could send if interested.

I am not sure I understand “So DIANNA would allow researchers to "extract data" (trained weights) from models.” Doesn’t any (X)AI library allow that? Also, imho, this is no the main function of a XAI library.

Will be glad to hear your opinion on the scope given these extra explanations. Is it possible to expand your list (as your repo documentation suggests) and include e.g. data analytics?

Also, please, send me your thoughts on how can we connect on organizational level.

NickleDave commented 2 years ago

Hi @elboyran and thank you for your clear point-by-point response.

Everything you are saying makes total sense to me. I think we agree that potentially some aspects of DIANNA are in scope for pyOpenSci, under the categories of visualization and reproducibility.

However, another factor we would need to consider is that XAI is an active research area. I'm sure you are aware of this, as your statement about "skeptical scientists" makes clear.

We would really need to make sure we have the right reviewers, that could address questions about XAI methods, especially with respect to fairness, bias, interpretability and usability.

I do not think we can provide you with this kind of review at this time. As we state in our guide:

Note that we cannot currently accept statistics and/or machine learning packages. We don’t feel that we can give these the deep, thorough review that they require.

You are also right that this would be venturing into the territory of "data analytics". I did discuss this with @lwasser who also felt that adding an analytics category would go beyond our intended scope. We are following the ROpenSci model where our central focus is on "tooling for the data life cycle". That community has only very recently added statistics packages. As a relatively new organization, we are just not able to extend our scope the same way.

You may be better served by going through review at a traditional journal.

Please let me know how that sounds. I am happy to discuss with you further.

NickleDave commented 2 years ago

Closing this for now but again we are happy to discuss further