12/2: Shakhnarovich - Githubissues

shevajia commented 2 years ago

Comment below with a well-developed group question about the reading for this week's workshop.

One person can submit on the group's behalf and put the Group Name in the submission for credit.

Please post your question by Wednesday 11:59 PM, and upvote at least three of your peers' comments on Thursday prior to the workshop. You need to use 'thumbs-up' for your reactions to count towards 'top comments,' but you can use other emojis on top of the thumbs up.

Thiyaghessan commented 2 years ago

Group: 2A

Hey Professor,

Thank you for sharing your work with us. In the provided example, the model was able to detect obvious changes such as the orange sign being moved or removed. These are very obvious changes, with the sign itself being of a sharply contrasting colour. However, with shifting lighting conditions and other environmental changes, objects can easily blend into the background or be interpreted as having different colours as before. How will the change detection be able to discern those nuances without being misled? It seems like the model is vulnerable to returning false positives in the situations mentioned above. Thank you!

GabeNicholson commented 2 years ago

Group 2F

Hi Professor, we have two main questions. (1) As it was mentioned in the paper, sometimes the visual distinction between the first and second halves of the sequence is subtle. While sometimes shorter sequences tend to contain larger viewpoint changes between frames. What is the key consideration of the size of sequences then? Is it better to make it shorter or longer? (2) How reliable is the semi-supervised learning process in practice? For instance, is there a large enough corpus of object names to label most of the unseen objects if we were to generalize the algorithm to all of google street images? Since the algorithm learning relies on language, it must be able to retrieve the names of certain objects, so where would this come from?

Hai1218 commented 2 years ago

Group 1N

Dear Professor Shakhnarovich,

The work you and your team are doing is impressive and potentially has a wide range of applications. From the top of my head, the applications could broadly range from defense intelligence to eco-geological science to social science, such as urban sciences regarding consolidation and intersections of groups and communities.

We do not have the mathematical proficiency thus are not apt in understanding the algorithms you presented in the paper. Nevertheless, we are still interested in inquiring about a technical question regarding your framework in the paper. We are thinking about this framework’s application to detect very minute changes on a large visual stream. For example, astrophysicists might be interested in seeing changes in starlight in a visual stream of the galaxy. Likewise, political economists might be interested in new constructions of light poles, roads, and other local infrastructure to analyze the partitioning of public goods. All these small changes of feature require very high resolution of image-detection processing. (1)Does the framework you and your team developed consider the need for high-resolution, minute detail change detection in visual streams? If so, in non-mathematical languages, could you briefly explain how?

Lastly, we also want to ask the team if any privacy protection mechanisms are embedded in the framework. With the wide applications of this technology in the near future, we imagine that there could be concerns over this technology’s potential violation of our privacy rights. (2)Are there embedded mechanisms that would “disable” the algorithm so that it is “blind” at detecting, for example, residents’ daily routine in a neighborhood?

Thanks!

1*bt-E2YcPafjiPbZFDMMmNQ

isaduan commented 2 years ago

Group 1K

Dear Professor Shakhnarovich,

Thank you for sharing your work with us. We have two main questions: (1) what are some use-cases you are most excited for? (2) are there any theoretical implication of the research for artificial general intelligence (AGI) ? In particular, does this mean developing more advanced, and more general-purpose intelligence requires a combination of different representation of the world/sensors (i.e. text + visual)?

Raychanan commented 2 years ago

Group 1C: Val Alvern Cueco Ligo, Rui Chen, Max Kramer, Yutai Li

With the integration of text and visual results of models, this paper opens a new avenue of field research. By integrating language during training, this approach is superior to baselines that do not do so. After finishing reading, we could not help but think about how we might integrate text, visuals, and audio in this paper. If a building disappears, the echoing may appear completely different when viewed from the point of view of soundwaves.

Another question we have concerns distractor elements. As shown in the figures, the "relevant" object is largely obvious. However, what if we focus on lighting (for example stars in the sky)? Are we at risk of ignoring these? Or is it very adaptable to various contexts of research and application?

sdbaier commented 2 years ago

Group 1J Lynette Dang, Silvan Baier, Yingxuan Liu, Sabina Hartnett

This is impressive. I am/We are interested in potential extensions of the detection method and the possibility to process larger datasets:

How would the detection method perform if (1) change unfolds as a process, (2) there are multiple changepoints, or (3) the input data significantly exceeds the training data in size?

The proposed method aims to detect a single changepoint in comparably short streams (average length of 9 images).

(1) What if change does not occur in a binary fashion (change versus no change, i.e. construction sign being there or not), but as a gradual process (i.e. the removal of a construction sign as captured by a surveillance camera). Would a nuanced detection like this be possible?

(2) Alternatively thinking, would the detection method filter this out as noise or label it as multiple changepoints?

(3) Lastly, how long does the detection take? Would applications in high-frequency environments (e.g. continuous video) be possible, or is the algorithm restricted to visual streams of lower density?

Thanks!

helyap commented 2 years ago

Hi Professor,

Thank you for sharing your research with us. I’m curious if the new methods presented in the paper can also be applied to video data or images with extremely short time gaps say if the video can be decomposed into series of images. Also, is there a limit to the level of granularity in changes the model can capture?

sudhamshow commented 2 years ago

Group 1F Professor Shakhnarovich, thanks for your presentation. The examples in the paper talk about the application of semi-supervised learning on images that have a perceivable boundary. Would it also be possible to detect changes in video contexts that do not have a perceivable boundary? Say for example detection of wildfire from a satellite or infiltration across boundaries - it is unclear what to search for in these kinds of streams, especially when the scenario has never been encountered before. Are the semi-supervised algorithms explicitly trained to lookout for these kinds of anomalies?

FrederickZhengHe commented 2 years ago

Group 1A: Angelica Bosko, Chongyu Fang, Zheng He, Yier Ling Thanks Professor Shakhnarovich for your paper and we look forward to hearing your presentation. We do not quite understand the formulas and the results of your paper, so it may be helpful for you to explain them more extensively tomorrow. Our question is: how could this study be applied to services that use automatic captioning, such as live videos, tv shows, Netflix, etc.?

borlasekn commented 2 years ago

Group 1E: Kaya Borlase, Xin Li, Zoey Jiao, Luiqi Guo

Thank you for sharing your expertise. We had a couple of questions:

Since the study employed a semi-supervised method that only relies on a small set of labeled data, would it make the result less accurate in comparison to a larger set of labeled data? What are the limitations and benefits of employing a semi-supervised method?
We were wondering about the implications of analyzing changes in visual streams. Could this be beneficial in other fields of research such as environmental research? Such as identifying species in an area where they are not normally detected or detecting environmental changes? Are there any other research areas you believe could benefit from this type of visual stream analysis?

egemenpamukcu commented 2 years ago

Group 2H: Gin Zheng, Egemen Pamukcu, Taize Yu, Ning Tang

Thank you professor for presenting your work. We all found the paper really interesting and had a couple questions.

Firstly, we would love to hear more about the potential real world applications of this model. Do you think it can be used to summarize footage from videos or movies? Or for security/monitoring purposes?

Secondly, we would be interested in learning more about how does this model compare to, and what does it borrow from, the traditional frameworks for analyzing change in image streams.

Thank you!

TwoCentimetre commented 2 years ago

Group 2M: Chenming Zhang, Chris Maurice, Xin Su, Yujing Sun

Thank you so much for sharing your work! We are several practical questions about the paper.

I wonder how this method treat multiple changes in one scenario. In the traffic sign case, if I add two signs and take away the original sign, would this method treat all the changes as a whole and stick a label to it or it would identify all three changes and stick three different labels to these changes separately? And if the change happens in a gradual pattern instead of all of a sudden, how would this method treat it?

In the paper, you note there is a lack of standard datasets for our task, and we are curious why that is the case? Is there a dream dataset you would like the opportunity to work with?

Looking forward to your talk!

Toushirow1 commented 2 years ago

Group 1D: Zixu Chen, MengChen Chung, Yujing Huang, Feihong Lei

Dear Professor Shakhnarovich,

Thank you for sharing your research. This is my first time learning about this technique, and this research has provided me with a better understanding of the technology in the field of computer vision. I learned that this paper was published last year, and this technology seems to be very cutting-edge. What are the current applications of visual stream detection and natural language label in the field of computer vision? Which technology companies or teams are developing similar algorithms and products? Because labeled training data is difficult to obtain, does unlabeled data trained through the semi-supervised training regimen require a lot of computing power to support it? Do visual stream detection and labeling work for automatic navigation using tiny device chips and in-vehicle MEMS? How does the natural language generation part work? And how does the reinforce help format the training framework in this case?

Thank you!

yhchou0904 commented 2 years ago

Group 1I: Yile Chen, Yu-Hsuan Chou, Jasmine Huang, Jingnan Liu Hi Professor Shakhnarovich, thank you so much for sharing your work with us. We are curious about some of the technical details of the paper: Usually, we would caption a picture with the seemly main subject in the picture. For example, for a picture, we might describe it without the change in the background. We are wondering how detailed the labels are for the pictures so that this kind of exception could be ruled out? We also would like to learn more about the potential applications in social science and some future research directions.

hhx2207061197 commented 2 years ago

Group 1L: Elliot Delahaye, HuangHongxian, Xi Cheng, Yutong Li

Hi Professor Shakhnarovich, thank you so much for sharing your work with us. Several questions are listed as follows: (1) From a comparative perspective, what would you think about the advantages of leveraging your method, which is a combination of vision and language description, over pure NLP? Would you say that vision plus language would be a method that has the greatest potential for accuracy increasing? (2) What if we change the subject of the image? How will the performance of the model change if we use other types of change datasets? In the paper, the Street Change Dataset is tested, revealing that natural language description is helpful. If the dataset changed into nature or including human, will the content of natural language description need to change as well? How can we determine the most informative type of natural language description for different change dataset? (3) In this paper, you have mentioned that "Experimental evaluation on visual stream datasets, which we release as part of our contribution, shows that representation learning driven by natural language descriptions significantly improves change detection accuracy, compared to methods that do not rely on language." However, I just want to know has this new technology been tested in real world settings because people may behave differently in lab experiment and real-world settings.

cgyhumble commented 2 years ago

Group 1B: Qishen Fu; Pranathi Iyer; Guangyuan Chen; Yuxuan Chen

Hi Professor! It is so great for us to read your article and enter your workshop. To be honest, I should admit that it is not easy to totally understand every point and knowledge you share in this paper but really worth to spend time to grasp them. It is impressive and interesting! Here are the two questions we raise after reading your paper: 1) You say that the images accurately identify changes, how do you predict a model like this to work on images that capture changes in facial features where the difference might not be so stark, and what could be the potential implications in such cases? 2) From a practical perspective, we wonder if this analytic framework is appliable for any kinds of usage scenario or internet product such as AR or VR? With recent news, we are also curious that if meta world, which Facebook tend to develop to in the future, needs this technology to process visual streams obtained by humans?

william-wei-zhu commented 2 years ago

Group 2I: Lingfeng Shan, Daniela Vadillo, Zimei Xia, William Zhu.

We came up with a couple of questions:

The purpose of the study is to find a method of analysis of change in visual streams that ignores nuisance changes. Although we see the importance in this, we wonder how the algorithm reacts to a relevant change that happens over a long period of time and is “camouflaged” as nuisance changes.
The image stream detection and description algorithms in this paper mainly deal with non-living objects. If the image description task is applied to human behaviors, we wonder if more complexity will be introduced: because a set of human behavior (as recorded by an image stream) may be interpreted differently depending on the cultural context. For example, it is very difficult to differentiate a “twitch” of an eye from a “wink” of an eye. The image stream of these two actions may look the same, but the meaning can be very different. We wonder how you and your team may approach this kind of problem.

Thanks for sharing. Looking forward to your presentation!

hsinkengling commented 2 years ago

Group 1H: Yuetong Bai, Boya Fu, Zhiyun Hu, Hsin-Keng Ling

Hi Professor Shakhnarovich. Thank you for sharing your work with us.

We're not entirely sure that we understood the paper fully, so feel free to correct us if these questions do not make sense.

We have three questions:

What constitutes nuisance vs non-nuisance change? Could this difference be subjective in some cases?
It seems that the target variable is binary (change vs no change). Is it possible to develop more detailed ways of describing the differences?
In this work, you considered both the description and the change detection problem, which makes me think that the model may not be far from identifying and segmenting natural events like human does. Do you have any future plans for this topic?

jiehanL commented 2 years ago

Group 1M: Jiehan Liu, Partha Kadambi, Peihan Gao, Shiyang Lai, Zhibin Chen

Hi Professor Shakhnarovich, thank you for such an exciting work on combining natural language processing with visual detection. As in this particular study, you are mainly focusing on the time series of static images, we are wondering if you treat motion objects as consecutive frames, is it possible to apply this technique to motion detection?

Coco-Jiachen-Yu commented 2 years ago

Group 2B: Coco Yu, Hongkai Mao, Justin Soll, Wanxi Zhou

Hi Professor Shakhnarovich,

Thank you so much for sharing your work with us. Our group finds your study very interesting and innovative, and we all look forward to your talk tomorrow.

Our questions are below: 1) Many videos and pictures in real life are not of high quality in terms of resolution(e.g., videos generated by surveillance cameras). Does the resolution of videos affect the accuracy of the results? 2) I'm very excited to see research that apply natural language processing to visual stimuli. What are the real-world applications of captioning visual changes?

AlexBWilliamson commented 2 years ago

Group 2D: Alex Williamson, Chuqing Zhao, Mike Packard, Yijing Zhang

Question 1: The conclusion to your paper says “learning to generate captions to describe change also enhances our ability to detect change.” Our group was curious if this result points you towards any other machine learning tasks that could benefit from the simultaneous inclusion of natural language? In a more general sense, we are wondering what the wider implications of this research might be for the discipline.

Question 2: The paper we read talks about using human language when training computers to recognize a change in a series of pictures. This innovation shows promise in improving the effectiveness of these sorts of computer programs. How do you envision this technique being applied specifically to research in the social sciences?

kthomas14 commented 2 years ago

Group 2G: Kaylah Thomas, Yao Yao, Awaid Yasin, Shengwenxin Ni

Hello Professor Shakhnarovich,

Thank you for sharing your work! We found it to be a fairly technical paper, where authors develop a new method that uses unlabelled pictures for video/image captioning, and the method performs better than existing strategies. We were all very interested in the potential social science applications of your work when developing questions.

Our group would like to ask: given the technique's superior performance, are there any significant social science applications where it can be/is being implemented? One group member brought up the the example of reading about how images from a nearly-extinct bird conservation needed to be labelled by humans to detect changes. We wonder if these methods can be applied in those scenarios. Which would, to much extent, make it less of a human exercise and allow for the analysis of a much larger set of activities of those birds. Additionally, do you anticipate any ethical considerations that may arise if these techniques are adapted further to detect changes in human images?

hshi420 commented 2 years ago

Group 2C: Taichi Tsujikawa, Lu Zhang, Fengyi Zheng, Haohan Shi

Hi Professor Shakhnarovich, this is verying interesting research, and we are looking forward to your presentation. Neuro-symbolic models now are popular because it can solve some problems that cannot be solved with pure learning. It integrates statistical learning and reasoning. It is also useful when there is not enough training set. We were wondering if neuro-symbolic model is suitable for this task.

Dxu1 commented 2 years ago

Group 2L: Alex Przybycin, Jingwen Ni, David Xu, Allison Towey, Sirui Zhou

Thank you for sharing your interesting paper, Prof Shakharovich. We are wondering if the algorithm can set a target for what type of relevant change is being looked for. For example in the paper you mention examples of satellite images accounting for weather change as an irrelevant change, though could they specifically label a scene as coastal change, and have something like "erosion from rising water level" produced as the natural language representation.

In addition, we are interested in your discussion of its application in both real world and social science research (e.g. in social discrimination researches?)

jinfei1125 commented 2 years ago

Group 2K: Baotong Zhang, Koichi Onogi, Senling Shu, Jinfei Zhu

Thank you for sharing such a great work! Our group's questions are: 1) What would be potential application for this research in social science? (For example, would it be possible tp detect changes in urban landscapes different geopolitical locations?) 2) Would you think it is possible that we can apply these techniques to help some disables? How do you think whether this model is going to work in real time? 3) In order to expand the range of application, Would the semi-supervised training regimen be still data consuming for application?

Thanks!

JunoWuu commented 2 years ago

Group 2E: Franco Mendes, Juno Wu, Nikki TingNikki Ting, Roberto Rondo Garces

Hello Prof. Shakhnarovich!

I think this paper is very inspiring and provides an interesting perspective to solve problems. It is really interesting to think about how natural language captioning improves the change detection performance. I wonder if there is any other related performance that can also be improved as much from this language captioning because I do not think the language only helped us when we are describing the change in our visual field.

Also, we wonder how would this technique deal with something that is hard to describe or requires higher language processing( such as something needing a sentence(s) to describe).

Thank you!

chentian418 commented 2 years ago

group 1G: Dear Professor Shakhnarovich, thanks for sharing your interesting research! Our group is curious about the usage of unlabeled data in Phase 2 when training the discriminator D. Are they used as held-out datasets to evaluate the change description w's validity? Or do they contribute to the semisupervised learning in some other ways?

Moreover, we was wondering is there particular categories of datasets that the detection and description of change method in visual streams works best with, as you have mentioned satellite images broadly. And do any specific features of the visual streams improve the semisupervised learning prominently?

Look forward to your presentation tomorrow!

uchicago-computation-workshop / Fall2021

12/2: Shakhnarovich #11