Closed elboyran closed 2 years ago
Hello, I was wondering if this pre-submission form has been noticed? For me, it is important to find out if my software fits the scope of pyOpenSci.
Hi @elboyran and welcome to the pyOpenSci community.
Thank you for providing a detailed pre-inquiry submission and stating a very clear question.
@lwasser and I have discussed whether the package is in scope. (Sorry for not getting back to you sooner -- we just had a holiday here in the US. Which shows you that we need a better distribution of editors 🙂)
The short answer is: yes, we think DIANNA could be in scope, but we do have a couple of questions.
We are still learning about how we can best support the scientific community using Python. It's clear from your inquiry and others like it that there's a need for tooling that supports researchers using deep learning methods. However, these kind of tools are not yet as common in the rOpenSci space, and we have modeled ourselves on that community. So we are figuring this out as we go.
Here are the areas of scope where we think the goals of DIANNA fit into pyOpenSci:
I'll explain each.
This one you identified and is fairly self-evident, although reasonable people could disagree about whether XAI methods are really visualizing data per se. But no one can deny that scientists in particular rely on visualizations to validate and tune algorithms they apply to data. So I think I would call this one "clearly in scope".
As you state, one of your goals is to replicate existing methods, with the overall goal of assisting domain scientists. The additional goal of achieving this through the ONXX standard would also increase reproducibility / replicability.
There is precedent for this being in scope: we previously provided review for pystiche
which similarly encapsulates deep learning-based style transfer methods with the goal of standardizing research in that area.
https://github.com/pyOpenSci/software-review/issues/25
I think meeting the ONXX standard could fall under this heading; you would make it possible for researchers to use XAI methods in a framework-agnostic manner, if I am understanding correctly. So DIANNA would allow researchers to "extract data" (trained weights) from models.
So, with that said, here are our questions:
Please say more about what specifically DIANNA will provide that other XAI libraries do not. How will DIANNA and the community around it give scientists additional guidance? E.g., if I am a researcher and I can train a CNN model, then I am probably competent enough to apply Grad-CAM to a layer in my CNN. I don't need an extra library to do that. One way I could image DIANNA augmenting this would be for the docs to consist of vignettes, where each vignette describes a use case for a specific domain. But I do not see that in the library now (or any docs, just let me know if I am missing them.) Please if you can provide more details about this.
How close would you say DIANNA is to being ready for use by researchers?
we ask that any tool/library we review be near "maturing" state as defined here in ROpenSci
see specifically the language on ROpenSci guide:
For any submission or pre-submission inquiry the README of your package should provide enough information about your package (goals, usage, similar packages) for the editors to assess its scope without having to install the package. At the submission stage, all major functions should be stable enough to be fully documented and tested; the README should make a strong case for the package (the editors will read it to e.g. evaluate this as in scope or not.).
I can see you all are working hard on development, but I don't see anything in the README that tells me how or when to use the library, e.g. a minimal snippet that demonstrates usage.
the reason we have this requirement is because our completed review represents a sign of approval on the API, and we want to avoid giving this approval to a library that then drastically changes its API after the review process
WRT the goal of providing interoperability via ONXX, my understanding is that this is far from trivial, as suggested by discussion on this issue: https://github.com/dianna-ai/dianna/issues/39
How comfortable would you be with review from developers of other XAI libraries? Those developers seem like the most likely candidates to provide informative reviews (along with researchers from specific domains you address). We would like to make sure you are ready to engage with that community of developers before we invite them to review. Alternatively, you may be operating in more of an "open stealth" mode. Please let us know.
So overall we do feel that DIANNA could be in scope and we would be very interested in providing review. We also definitely want to connect with your organization since we clearly have aligned goals.
As of right now, though, I think we would need to see the requirement met that the package is near a "maturing" state, before initiating a review. Unless @lwasser feels very strongly otherwise.
Maybe it would be best to contact us again when you feel you are nearing that state? Happy to discuss further here though.
Dear David (@NickleDave) and @lwasser,
Thank you very much for your time and detailed feedback.
First the questions covered by short answers:
Data visualization and in a way reproducibility are matching categories. I am not sure if the Data extraction is a good category and maybe you would agree with me after my answers to your questions.
Some of the work explanations above have happened in a sibling repo (we are considering in merging them), but all motivation and plan is described in my initial funding proposal, which could send if interested.
I am not sure I understand “So DIANNA would allow researchers to "extract data" (trained weights) from models.” Doesn’t any (X)AI library allow that? Also, imho, this is no the main function of a XAI library.
Will be glad to hear your opinion on the scope given these extra explanations. Is it possible to expand your list (as your repo documentation suggests) and include e.g. data analytics?
Also, please, send me your thoughts on how can we connect on organizational level.
Hi @elboyran and thank you for your clear point-by-point response.
Everything you are saying makes total sense to me. I think we agree that potentially some aspects of DIANNA are in scope for pyOpenSci, under the categories of visualization and reproducibility.
However, another factor we would need to consider is that XAI is an active research area. I'm sure you are aware of this, as your statement about "skeptical scientists" makes clear.
We would really need to make sure we have the right reviewers, that could address questions about XAI methods, especially with respect to fairness, bias, interpretability and usability.
I do not think we can provide you with this kind of review at this time. As we state in our guide:
Note that we cannot currently accept statistics and/or machine learning packages. We don’t feel that we can give these the deep, thorough review that they require.
You are also right that this would be venturing into the territory of "data analytics". I did discuss this with @lwasser who also felt that adding an analytics category would go beyond our intended scope. We are following the ROpenSci model where our central focus is on "tooling for the data life cycle". That community has only very recently added statistics packages. As a relatively new organization, we are just not able to extend our scope the same way.
You may be better served by going through review at a traditional journal.
Please let me know how that sounds. I am happy to discuss with you further.
Closing this for now but again we are happy to discuss further
Submitting Author: Elena Ranguelova (@elboyran)
Package Name: dianna One-Line Description of Package: an eXplainable AI (XAI) python library targeted at scientists Repository Link (if existing): https://github.com/dianna-ai/dianna
Description
Modern scientific challenges are often tackled with (Deep) Neural Networks (DNN). Despite their high predictive accuracy, DNNs lack inherent explainability. Many DNN users, especially scientists, do not harvest DNNs power because of lack of trust and understanding of their working.
Meanwhile, the eXplainable AI (XAI) methods offer some post-hoc interpretability and insight into the DNN reasoning. This is done by quantifying the relevance of individual features (image pixels, words in text, etc.) with respect to the prediction. These "relevance heatmaps" indicate how the network has reached its decision directly in the input modality (images, text, speech etc.) of the data.
There are many Open Source Software (OSS) implementations of these methods, alas, supporting a single DNN format and the libraries are known mostly by the AI experts. The DIANNA library supports the best XAI methods in the context of scientific usage providing their OSS implementation based on the ONNX standard and demonstrations on benchmark datasets. Representing visually the captured knowledge by the AI system can become a source of (scientific) insights.
It is work in progress, for now DIANNA supports part of chosen by objective criteria AI explainability methods such as RISE, LIME
and DeepLIFT SHAP (under development) for ONNX models, which means it's AI framework agnostic.
Scope
Please indicate which category or categories this package falls under:
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
A possible category could be Data visualization, but the actual one is missing from the list. It would fall under the broad one of Data Analytics and more precisely eXplainable AI.
Who is the target audience and what are the scientific applications of this package?
Scientists in any domain (especially not (X)AI experts), but also any other AI users who want to open the AI "black boxes". Very much so also for the XAI developers who want to study and compare against the proposed benchmarks we offer, the properties of their methods to the state-of-the-art ones. The scientific application potential is enormous and not limited to any science domain. Examples are given , e,g, in these publications: Explainable Machine Learning for Scientific Insights and Discoveries Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications, nice summary illustration in Fig. 20 on page 268
Are there other Python packages that accomplish similar things? If so, how does yours differ? As mentioned in the Description, there are many packages (shap, lime, etc.) and even libraries (e.g. Captum, iNNvestigate, etc.) implementing eith a single XAI method or a group of methods (without a clearly motivated choice, than own research) for a single DNN format (e.g. only pytorch or keras (tensorflow)) . With the careful objective selection of XAI methods and the choice of ONNX standard supported, it makes it the only library applicable to any trained AI model, independent of the framework, hence espacialaly useful for any domain scientists. We aim to support also many data domains- images, text, and in the future time-series, tabular and graph data.
Any other questions or issues we should be aware of: It is work in progress, but we are already thinking of how to disseminate it. We want to reach and be useful to as many domains scientists as possible. That is the mission of our organization- the Netherlands eScience center.
The question is does it fit in the scope of pyOpenSci?
P.S. *Have feedback/comments about our review process? Leave a comment here