open-life-science / ols-1

Creative Commons Attribution Share Alike 4.0 International

6 stars 1 forks source link

plenoptic #5

Open billbrod opened 4 years ago

billbrod commented 4 years ago

Project Lead: @billbrod

Mentor: @rodrigocam

Welcome to OLS-1! This issue will be used to track your project and progress during the program. Please use this checklist over the next few weeks as you start Open Life Science program :tada:.

Before Week 1 (Jan 20): Meet your mentor!

[x] Create an account on GitHub
[x] Complete your own copy of the open leadership self-assessment and share it to your mentor

If you're a group, each teammate should complete this assessment individually. This is here to help you set your own personal goals during the program. No need to share your results, but be ready to share your thoughts with your mentor.
[x] Make sure you know when and how you'll be meeting with your mentor.

Before Week 2 (Jan 27): Cohort Call (Welcome to Open Life Science!)

[x] Create an issue on the OLS-1 GitHub repository for your OLS work and share the link to your mentor.
[x] Draft a brief vision statement using your goals

This lesson from the Open Leadership Training Series (OLTS) might be helpful
[x] Leave a comment on this issue with your draft vision statement & be ready to share this on the call
[x] Check the Syllabus for notes and connection info for all the cohort calls.

Before Week 3 (Feb 3): Meet your mentor!

[x] Look up two other projects and comment on their issues with feedback on their vision statement
[x] Complete this compare and contrast assignment about current and desired community interactions and value exchanges
[x] Complete your Open Canvas (instructions, canvas)
[x] Share a link to your Open Canvas in your GitHub issue
[x] Start your Roadmap
[x] Comment on your issue with your draft Roadmap
[x] Suggest a cohort name at the bottom of the shared notes and vote on your favorite with a +1

Before Week 4 (Feb 10): Cohort Call (Tooling and roadmapping for Open projects)

[x] Look up two other projects and comment on their issues with feedback on their open canvas.

Week 5 and more

[x] Create a GitHub repository for your project
[x] Add the link to your repository in your issue
[x] Use your canvas to start writing a README.md file, or landing page, for your project
[x] Link to your README in a comment on this issue
[x] Add an open license to your repository as a file called LICENSE.md
[x] Add a Code of Conduct to your repository as a file called CODE_OF_CONDUCT.md
[ ] Invite new contributors to into your work!

This issue is here to help you keep track of work as you start Open Life Science program. Please refer to the OLS-1 Syllabus for more detailed weekly notes and assignments past week 4.

billbrod commented 4 years ago

Vision statement

plenoptic is a project by graduate students and postdocs in the Lab for Computational Vision to build a python library that enables researchers who build computational models that operate on and extract information from images (in neuroscience, computer vision, or other fields) to interrogate their models, better understand how they work, and improve their ability to run experiments for testing them. This is an open project because the included methods have been described in the literature but do not have easy-to-use, widely-available implementations and we believe that science functions best when it is transparent, accessible, and inclusive.

rodrigocam commented 4 years ago

Vision statement

plenoptic is a project by graduate students and postdocs in the Lab for Computational Vision to build a python library that enables researchers who build computational models that operate on and extract information from images (in neuroscience, computer vision, or other fields) to interrogate their models, better understand how they work, and improve their ability to run experiments for testing them. This is an open project because the included methods have been described in the literature but do not have easy-to-use, widely-available implementations and we believe that science functions best when it is transparent, accessible, and inclusive.

I really liked this vision statement!

billbrod commented 4 years ago

Vision statement draft 2

After discussing a bit with some of my labmates who are working on the project, we have another potential draft (that's a bit more readable at the cost of breaking up sentences):

plenoptic is a python library that provides a suite of synthesis methods that facilitate the perceptual and computational analysis of models that extract information from images (e.g., deep neural networks, models of human visual cortex). The Members of the Lab for Computational Vision initiated this open source project because though the methods and models implemented in plenoptic have been described in the literature, they do not have widely-available, easy-to-use, and generalizable implementations.

EDIT: added examples of models

cassgvp commented 4 years ago

Vision statement draft 2

After discussing a bit with some of my labmates who are working on the project, we have another potential draft (that's a bit more readable at the cost of breaking up sentences):

plenoptic is a python library that provides a suite of synthesis methods that facilitate the perceptual and computational analysis of models that extract information from images. The Members of the Lab for Computational Vision initiated this open source project because though the methods and models implemented in plenoptic have been described in the literature, they do not have widely-available, easy-to-use, and generalizable implementations.

Love the re-draft. Do you think it could be simplified a bit further? I work with (magnetic resonance) images and python, and I'm not sure I understand what those sentences mean 🤨😜!

I think it's "synthesis methods" which is throwing me in the first part. Not sure I understand what that means outside of a specific technical domain. You could make it a bit lighter in the second half by just saying "... initiated this open source project to address the need for an easy-to-use implementation of...". I don't think you need to explicitly say here that other methods exist (that feels a bit distracting), only that your one is great 😉.

Very happy to be used as a test case for broad understanding in communication of your vision, if it helps? Also, great work from you and the Computational Vision Lab for getting the project off the ground! 🚀

billbrod commented 4 years ago

Here's my open canvas.

billbrod commented 4 years ago

The overview of my roadmap:

Short term:
1. Finalize Portilla-Simoncelli texture statistics
2. Add MAD competition
3. Create Synthesis superclass
Medium term:
1. Finalize geodesics
2. Get eigendistortion and geodesics to use Synthesis superclass
3. Write more documentation and tutorials
4. Finalize model API, create superclass
5. Add more models
Long term:
1. Present poster at conference to advertise to users
2. Submit paper to Journal of Open Source Software to get something for people to cite

There's also a more detailed version on the Github projects tab (here's a screenshot):

Screenshot from 2020-02-07 12-37-02

billbrod commented 4 years ago

Vision statement draft 2

After discussing a bit with some of my labmates who are working on the project, we have another potential draft (that's a bit more readable at the cost of breaking up sentences): plenoptic is a python library that provides a suite of synthesis methods that facilitate the perceptual and computational analysis of models that extract information from images. The Members of the Lab for Computational Vision initiated this open source project because though the methods and models implemented in plenoptic have been described in the literature, they do not have widely-available, easy-to-use, and generalizable implementations.

Love the re-draft. Do you think it could be simplified a bit further? I work with (magnetic resonance) images and python, and I'm not sure I understand what those sentences mean 🤨stuck_out_tongue_winking_eye!

I think it's "synthesis methods" which is throwing me in the first part. Not sure I understand what that means outside of a specific technical domain. You could make it a bit lighter in the second half by just saying "... initiated this open source project to address the need for an easy-to-use implementation of...". I don't think you need to explicitly say here that other methods exist (that feels a bit distracting), only that your one is great wink.

Very happy to be used as a test case for broad understanding in communication of your vision, if it helps? Also, great work from you and the Computational Vision Lab for getting the project off the ground! rocket

Thanks for the feedback and I really appreciate the offer for a test case! Trying to make this understandable and succinct is really difficult, and we've talked about it amongst ourselves so much that we forget what's understandable to other people. "Synthesis methods" is a specific term, and we definitely want to keep it in the vision statement (people in the vision science research community will understand it), but I think it would be too long to unpack it there. The next bit in our readme will explain it, something along the following lines:

All models have three components, inputs x, outputs y, and parameters θ. When working with models, we typically either simulate, by holding x and θ constant, and generating predictions for y, or fit, by holding x and y constant and using optimization to find the best-fitting θ. However, for optimization purposes, there's nothing special about x, so we can instead hold y and θ constant and use optimization to synthesize new x. Synthesis methods are those that to do this: they take a model with set parameters and generate new images in specific ways. They allow you to better understand the model by determining what it considers important and, crucially, what it ignores, as well as generating novel stimuli for testing the model.

A little table we have to summarize this is:

{x, y, θ} : inputs, outputs, parameters

	fixed	variable
simulate	{x, θ}	{y}
learn	{x, y}	{θ}
synthesize	{y, θ}	{x}

Does that make more sense? I'm not sure of a concise enough way to phrase to make including it in the vision statement reasonable, but I agree it definitely has to be unpacked!

homo-sapiens34 commented 4 years ago

Very interesting project! As far as I see, the first version of your vision emphasizes building and training models themselves, whereas the second version is more about interpretation - which is a far more tough task. Probably, it is about data augmentation as well, am I right?

Talking about interpretation, what methods are you going to implement? Attention visualization, gradient-based feature importance or something else? Could your tool be used for the prevention of adversarial attacks on image-processing models?

billbrod commented 4 years ago

Yes, I think we want to emphasize interpretation more, it's the inclusion of the synthesis methods that I think is most unique. By "data augmentation", do you mean ways to manipulate the training set of neural networks or similar models to artificially enlarge the training set and hopefully enable the model to learn the invariances you want? E.g., take each image and randomly crop and shift the target object so that the model needs to learn shift invariance? For this project, we're actually not interested in that -- we're not interested in ways to better train models, but in ways to better understand / interpret trained models.

This will end up in our README at some point, but the methods we're implementing right now have all been developed in the lab, and are (with links to the papers describing them):

metamers: given a model and an image, synthesize a new image that the model thinks is identical.
Maximal differentiation (MAD) competition: given two models and an image, synthesize two pairs of images: two that the first model thinks are identical that the second thinks are as different as possible, and two that the first model thinks are as identical as possible.
Geodesics: given a model and two images, synthesize a set of images that the model thinks are intermediate between the two. That is, how does the model think you go from one image (for example, a picture of a human on the left side of a chair) to another (a picture of a human on the right side of the chair).
Eigendistortions: given a model and an image, synthesize the most and least noticeable distortion on the image (with a constant mean-squared error in pixels). That is, if you can change all pixel values by a total of 100, how does the model think you should do it to make it as obvious as possible, and how does the model think you should do it to make it unnoticeable.

(where for all of these, when I say "the model thinks", I'm referring to the L2-distance in a model's representation, so that "model thinks is identical" means that the model representations are identical, and "model thinks are as different as possible" means that the model representations are as far apart from each other as possible)

... it's very hard to come up with compact, easy-to-understand descriptions of these methods. That's why tutorials and good diagrams are going to be especially important for this project, I think.

As for adversarial attacks, I think these tools will enable researchers to get a sense for what a model thinks is important and what it ignores, which will hopefully enable them to build better models, by either changing the architecture or how they train them. I think adversarial examples are a great motivating example for this! The space of all possible images is impossibly vast, so models will behave in weird, unpredictable ways, and these methods attempt to help you figure out how your model behaves outside your training set.

billbrod commented 4 years ago

Okay, here's another attempt at a vision statement. I'm departing a little bit from the guide on the Mozilla page, but I think this project needs a little more unpacking (as everyone's feedback indicated!).

In recent years, adversarial examples have demonstrated how difficult it is to understand how models make sense of the images they view. The space of all possible images is impossibly vast and difficult to explore, so that even training a model on millions of images represent just a small fraction of all that could be shown. plenoptic is a python library that provides tools to help researchers better understand their models by using optimization to generate novel images. These images allow researchers to gain a sense for what features the model ignores and what it considers important, and they can be used in experiments for further model testing and validation.

Part of the issue I'm having with this is related to who this should target and where it will sit. Is this the elevator pitch I'll give to my intended users? Will this be the top part of my README that someone might stumble across randomly? There's a lot of jargon I want to throw into this vision statement, so it's hard to think about how best to make this as approachable as possible. As a scientist (can I say that if I'm still working on my PhD?), my impulse is to throw the jargon in and then explain it later, but I suppose that the goal here is to explain it in as understandable a way as possible, and then add more details later on -- it's hard!

cassgvp commented 4 years ago

Okay, here's another attempt at a vision statement. I'm departing a little bit from the guide on the Mozilla page, but I think this project needs a little more unpacking (as everyone's feedback indicated!).

In recent years, adversarial examples have demonstrated how difficult it is to understand how models make sense of the images they view. The space of all possible images is impossibly vast and difficult to explore, so that even training a model on millions of images represent just a small fraction of all that could be shown. plenoptic is a python library that provides tools to help researchers better understand their models by using optimization to generate novel images. These images allow researchers to gain a sense for what features the model ignores and what it considers important, and they can be used in experiments for further model testing and validation.

Part of the issue I'm having with this is related to who this should target and where it will sit. Is this the elevator pitch I'll give to my intended users? Will this be the top part of my README that someone might stumble across randomly? There's a lot of jargon I want to throw into this vision statement, so it's hard to think about how best to make this as approachable as possible. As a scientist (can I say that if I'm still working on my PhD?), my impulse is to throw the jargon in and then explain it later, but I suppose that the goal here is to explain it in as understandable a way as possible, and then add more details later on -- it's hard!

TOTALLY GET IT NOW! Love it. Thank you 😊

dlebauer commented 4 years ago

Your current vision statement is very clear and presents this as an interesting and valuable project - I am interested in learning more.

But it is a bit longer than the suggested one sentence. One thing that you might consider is dividing this into separate mission and vision statements followed by an introduction with more specifics; the mission is what you do and a vision is what the world will look like when you get there (see google for more on these).

If I try to pare this down to core ideas I came up with something that might be too sparse, but:

mission: help researchers understand and measure what features a model considers important.
vision: human interpretation of computer vision models

Then you could use the text that you provided above to explain, for people interested in learning more.

dannycolin commented 4 years ago

Awesome project!

Generally, your Open Canvas looks good. But, I would add two things that I think are sometimes overlooked in Open Source software.

First, I'd add "Documentation" as both in resources required because a good documentation could have a massive impact on the usage metrics. I'd also add it in contributor profiles since writing documentation is a specific skill that developers don't necessarily have.

Second, I'd add "License (maybe Lawyer)" in the resources required. Choosing the right license can be hard without legal knowlegde. You would also want to be sure that your contributors are aware that their contributions will be licensed under the license you chose. There's also a nice website to help you on that topic (see: https://choosealicense.com/).

billbrod commented 4 years ago

Thanks all for the feedback!

Here's the link to the github repo.

The repo containing all the code is currently private because it's under (heavy) development and my collaborators don't want to open it until they feel like it's closer to ready. I'm going to double-check with them if they're okay opening it now, with a strong warning at the top that it's under development, so that others can look at it.

EDIT: We're going to continue to have discussions about this as a group, but for the time being I've created a new repo that copies over the relevant parts for OLS, e.g., the README, and that will be public. I've updated the link to point to this version of the repo

yochannah commented 4 years ago

I wonder if a "Stability | Alpha" or "Stability | WIP" badge might come in handy here? https://github.com/mkenney/software-guides/blob/master/STABILITY-BADGES.md

yochannah commented 4 years ago

Hi Project Leads,

  This is your project report file:

https://hackmd.io/SVKgJX7mR52idsurH6goRg?both

Please start working on it for the final presentation. We will send more info in the weekly email.

  Best, 

OLS team