twosixlabs / armory-library

Python library for Adversarial ML Evaluation
https://twosixlabs.github.io/armory-library/
MIT License
7 stars 2 forks source link

create Product Increment 3 demo #88

Closed mwartell closed 8 months ago

mwartell commented 9 months ago

PI3 demo day is Tuesday 23-Jan-2024. This is the umbrella issue for work toward that. At present, the requirements for the demo live in a pinned Slack post. I am pasting them below for reference, but consider that link authoritative, epecially since paste from Slack does not preserve formatting.

Notable requirements

  1. The entire demo will run as notebooks on RAVEN, the hosted JATIC jupyter notebook service.
  2. MAITE (the library formerly known as jatic_toolbox) will provide model and dataset access.
  3. There is no model specified yet, although I'd put even odds on YOLO vX
  4. The PI3 demo folder on GitLab is populated with a clone of the PI2 demo. While this is good for setting the structure, assume details in there are old unless updated after 15-Dec-23

Tasks

mwartell commented 9 months ago

Pasted from https://cdaote.slack.com/archives/C05QBUF38PP/p1702045498055439 with external participant information excluded.

Hello everyone, here is the idea and plan for the increment 3 demo. We are going to roughly follow the same/similar plan as for the increment 2 demo. The increment demo will feature an integrated AI T&E workflow, following Amelia, an IV&V AI T&E persona, Amelia, performing AI T&E on models for detection and classification of objects from data collected from drones/UAVs. The persona is named "Amelia" and can be found here on the right. Dataset: The dataset chosen for this demo is VisDrone (https://github.com/VisDrone/VisDrone-Dataset). Just like last time, each team will also have an opportunity to provide an overview of their capability outside of the context of the T&E workflow, if they desire. Two datasets will be made for the demo, similar to for increment2. One dataset will be generic, one will be collected by the T&E engineer. Both dataset will be from VisDrone. The generic dataset will be the set in VisDrone labeled "Task 1: Object Detection in Images". The demo will describe the second dataset as one collected at a test event by a UAV to evaluate the models under test. This test data (unsure if it is "operational"; calling it "test" for now) should come from VisDrone "Task 2: Object Detection in Videos" by selecting a series of images from videos. Models: Common models will be selected to use across the program. This will be posted here when they are selected. Give feedback if you have feedback :slightly_smiling_face: Demo Details: Since a lot of the efforts across the teams this increment are on documentation, availability (e.g., release), and immediacy (hand-on experience through demo), we should see how we can show some of these things off. It can be tricky. E.g., nobody wants a demo where we walk through piles of documentation. But showing off, in the demo, that the documentation exists, can answer a question that the persona, Amelia, might have. Similarly, we might have a preliminary step where we show off Gradio demos on Huggingface, if they are available, to show the step of a person trying to find tools for T&E, messing around with ours on Huggingface, then deciding to use JATIC tools. The entire demo will be run on RAVEN. As before the flow of the demo will proceed as follows: Introduction to the program Introduction to the demo. Explaining the key updates since last increment: public release, documentation, some publicly available demos, availability of RAVEN. Includes "Next steps" for audience who want to use or engage with JATIC. Provide document (e.g., pdf) handout with next step info. Introduction of the persona, mission, and demo context. ("Introduction", in the increment 3 demo folder) Tool X Representative for Tool X takes over as the speaker and starts screen share. Screen is open to Tool X notebook (in the increment 3 demo folder) Speaker describes the problem experienced by the persona, the solution provided by their tool. Speaker briefly digresses to give some context and explanation of the tool. Speaker briefly describes Tool X in generality (outside of the frame of the mission), including the overall context for the tool and its major features. Speaker returns to the frame of the mission, demoing the application of Tool X for our persona. Highlight differences of this problem, persona/workflow, and increment vs last increment. Speaker may reference documentation briefly as part of demonstrating the use of the tool by the persona. Speaker may use visuals or GUIs they have developed to help with this portion. Speaker notes the successful resolution of the problem. Speaker describes new things, like new features, documentation, link to access, public release. Transition to next block Repeat 4 for each tool RAVEN demo. Any additional content we want to go into where we delve into demos that don't fit in the above flow. Next steps: steps for audience. E.g., follow-up engagement methods, contact info, info on regularly scheduled (weekly) demos, huggingface demos, gitlab access, feedback, etc. The intent is that the demo has a common thread which ties each tool within the program together, while also providing a chance to highlight the tool in an independent context and show material which may not make sense strictly within the workflow. It also continues to provide groundwork and guidance on how someone might/should do T&E using our (or any) tools. Unresolved questions (please suggest additions and I will add): Maybe this time we demo RAVEN first as the way by which the T&E things can be set up? I.e., move RAVEN from 4 to the first tool in 2. Next Steps: The demo will be assembled in our gitlab repo, in the increment 3 demo folder. The framework of the demo will be found in our gitlab repo, in the increment 3 demo folder. Each team should think about and discuss (including with your Product Owner) what you will include in the demo. Just as last increment, each team should fill out their notebook(s) and put in the necessary supporting files. I'd like to have a bit of improved organization for the supporting files. E.g., all supporting figures should go in the supporting_figures folder in a folder with the name of your team. I should meet with each team (either with PO or team itself) about what you plan to include in the demo. Demo content expectations for teams Timebox of ~10 minutes or less for each capability 0-3 slides. Emphasis on new functionality and accomplishing a use case Deemphasis on implementation methods and extraneous jargon To respect the timebox, not all new functionality needs to be shown (this is not a time for teams to "prove" that they were productive or "show off" the quantity of new work) Internal participants Each scrum team should choose a presenter for their section of the demo. This presenter should be the person on the team who is best able to represent their product to an end-user or external stakeholder. It could be the team lead, a developer, or the product owner. PM team Product owners Team leads SysEng team Screen is open to Tool X notebook (in the increment 3 demo folder) Speaker describes the problem experienced by the persona, the solution provided by their tool. Speaker briefly digresses to give some context and explanation of the tool. Speaker briefly describes Tool X in generality (outside of the frame of the mission), including the overall context for the tool and its major features. Speaker returns to the frame of the mission, demoing the application of Tool X for our persona. Highlight differences of this problem, persona/workflow, and increment vs last increment. Speaker may reference documentation briefly as part of demonstrating the use of the tool by the persona. Speaker may use visuals or GUIs they have developed to help with this portion. Speaker notes the successful resolution of the problem. Speaker describes new things, like new features, documentation, link to access, public release. Transition to next block Repeat 4 for each tool RAVEN demo. Any additional content we want to go into where we delve into demos that don't fit in the above flow. Next steps: steps for audience. E.g., follow-up engagement methods, contact info, info on regularly scheduled (weekly) demos, huggingface demos, gitlab access, feedback, etc. The intent is that the demo has a common thread which ties each tool within the program together, while also providing a chance to highlight the tool in an independent context and show material which may not make sense strictly within the workflow. It also continues to provide groundwork and guidance on how someone might/should do T&E using our (or any) tools. Unresolved questions (please suggest additions and I will add): Maybe this time we demo RAVEN first as the way by which the T&E things can be set up? I.e., move RAVEN from 4 to the first tool in 2. Next Steps: The demo will be assembled in our gitlab repo, in the increment 3 demo folder. The framework of the demo will be found in our gitlab repo, in the increment 3 demo folder. Each team should think about and discuss (including with your Product Owner) what you will include in the demo. Just as last increment, each team should fill out their notebook(s) and put in the necessary supporting files. I'd like to have a bit of improved organization for the supporting files. E.g., all supporting figures should go in the supporting_figures folder in a folder with the name of your team. I should meet with each team (either with PO or team itself) about what you plan to include in the demo. Demo content expectations for teams Timebox of ~10 minutes or less for each capability 0-3 slides. Emphasis on new functionality and accomplishing a use case Deemphasis on implementation methods and extraneous jargon To respect the timebox, not all new functionality needs to be shown (this is not a time for teams to "prove" that they were productive or "show off" the quantity of new work) Internal participants Each scrum team should choose a presenter for their section of the demo. This presenter should be the person on the team who is best able to represent their product to an end-user or external stakeholder. It could be the team lead, a developer, or the product owner. PM team Product owners Team leads SysEng team External participants The list of external participants is being compiled in the comments below. Broadly, most of the external stakeholders are DoD government who are experienced with AI T&E. Many have explicit missions within the domain of overhead imagery (NGA, USSF, Army Titan). Nearly all work within the AI space. Assume a level of familiarity with concepts in AI and T&E, but not necessarily code literacy. Program management, product owners, and team leads may invite whoever they think is appropriate. This may include: Development Team Members CDAO Representatives Prospective End-Users Other DoD Stakeholders FFRDC Members Location The System Demo will be hosted on DoD MS Teams to ensure external stakeholders can attend. The demo will be recorded.

mwartell commented 8 months ago

Overcome by events.