New submissions for Mon, 23 Jan 23

Keyword: metric learning

There is no result

Keyword: image retrieval

There is no result

Keyword: self-supervised

Spatial Steerability of GANs via Self-Supervision from Discriminator

Authors: Jianyuan Wang, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2301.08455
Pdf link: https://arxiv.org/pdf/2301.08455
Abstract Generative models make huge progress to the photorealistic image synthesis in recent years. To enable human to steer the image generation process and customize the output, many works explore the interpretable dimensions of the latent space in GANs. Existing methods edit the attributes of the output image such as orientation or color scheme by varying the latent code along certain directions. However, these methods usually require additional human annotations for each pretrained model, and they mostly focus on editing global attributes. In this work, we propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space or requiring extra annotations. Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias. Along with training the GAN model from scratch, these heatmaps are being aligned with the emerging attention of the GAN's discriminator in a self-supervised learning manner. During inference, human users can intuitively interact with the spatial heatmaps to edit the output image, such as varying the scene layout or moving objects in the scene. Extensive experiments show that the proposed method not only enables spatial editing over human faces, animal faces, outdoor scenes, and complicated indoor scenes, but also brings improvement in synthesis quality.
A Semi-supervised Sensing Rate Learning based CMAB Scheme to Combat COVID-19 by Trustful Data Collection in the Crowd
Authors: Jianheng Tang, Kejia Fan, Wenxuan Xie, Luomin Zeng, Feijiang Han, Guosheng Huang, Tian Wang, Anfeng Liu, Shaobo Zhang
Subjects: Human-Computer Interaction (cs.HC); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2301.08563
Pdf link: https://arxiv.org/pdf/2301.08563
Abstract Mobile CrowdSensing (MCS), through employing considerable workers to sense and collect data in a participatory manner, has been recognized as a promising paradigm for building many large-scale applications in a cost-effective way, such as combating COVID-19. The recruitment of trustworthy and high-quality workers is an important research issue for MCS. Previous studies assume that the qualities of workers are known in advance, or the platform knows the qualities of workers once it receives their collected data. In reality, to reduce their costs and thus maximize revenue, many strategic workers do not perform their sensing tasks honestly and report fake data to the platform. So, it is very hard for the platform to evaluate the authenticity of the received data. In this paper, an incentive mechanism named Semi-supervision based Combinatorial Multi-Armed Bandit reverse Auction (SCMABA) is proposed to solve the recruitment problem of multiple unknown and strategic workers in MCS. First, we model the worker recruitment as a multi-armed bandit reverse auction problem, and design an UCB-based algorithm to separate the exploration and exploitation, considering the Sensing Rates (SRs) of recruited workers as the gain of the bandit. Next, a Semi-supervised Sensing Rate Learning (SSRL) approach is proposed to quickly and accurately obtain the workers' SRs, which consists of two phases, supervision and self-supervision. Last, SCMABA is designed organically combining the SRs acquisition mechanism with multi-armed bandit reverse auction, where supervised SR learning is used in the exploration, and the self-supervised one is used in the exploitation. We prove that our SCMABA achieves truthfulness and individual rationality. Additionally, we exhibit outstanding performances of the SCMABA mechanism through in-depth simulations of real-world data traces.
Self-supervised learning for a nonlinear inverse problem with forward operator involving an unknown function arising in Photoacoustic Tomography
Authors: Gyeongha Hwang, Gihyeon Jeon, Sunghwan Moon
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2301.08693
Pdf link: https://arxiv.org/pdf/2301.08693
Abstract In this article, we concern with a nonlinear inverse problem with forward operator involving an unknown function. The problem arises in diverse applications and is challenging by the presence of the unknown function, which makes it ill-posed. Additionally, the nonlinear nature of the problem makes it difficult to use traditional methods and thus the study has addressed a simplified version of the problem by either linearizing it or assuming knowledge of the unknown function. Here, we propose a self-supervised learning to directly tackle a nonlinear inverse problem involving an unknown function. In particular, we focus on an inverse problem derived in Photoacoustic Tomograpy (PAT) which is a hybrid medical imaging with high resolution and contrast. PAT can be modelled based on the wave equation. The measured data is the solution of the equation restricted to the surface and the initial pressure of the equation contains the biological information on the object of interest. The speed of sound wave in the equation is unknown. Our goal is to determine the initial pressure and the speed of sound wave simultaneously. Under a simple assumption that the sound speed is a function of the initial pressure, the problem becomes a nonlinear inverse problem involving an unknown function. The experimental results demonstrate that the proposed algorithm performs successfully.
Keyword: vision transformer

Image Memorability Prediction with Vision Transformers
Authors: Thomas Hagen, Thomas Espeseth
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2301.08647
Pdf link: https://arxiv.org/pdf/2301.08647
Abstract Behavioral studies have shown that the memorability of images is similar across groups of people, suggesting that memorability is a function of the intrinsic properties of images, and is unrelated to people's individual experiences and traits. Deep learning networks can be trained on such properties and be used to predict memorability in new data sets. Convolutional neural networks (CNN) have pioneered image memorability prediction, but more recently developed vision transformer (ViT) models may have the potential to yield even better predictions. In this paper, we present the ViTMem, a new memorability model based on ViT, and evaluate memorability predictions obtained by it with state-of-the-art CNN-derived models. Results showed that ViTMem performed equal to or better than state-of-the-art models on all data sets. Additional semantic level analyses revealed that ViTMem is particularly sensitive to the semantic content that drives memorability in images. We conclude that ViTMem provides a new step forward, and propose that ViT-derived models can replace CNNs for computational prediction of image memorability. Researchers, educators, advertisers, visual designers and other interested parties can leverage the model to improve the memorability of their image material.
Holistically Explainable Vision Transformers
Authors: Moritz Böhle, Mario Fritz, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2301.08669
Pdf link: https://arxiv.org/pdf/2301.08669
Abstract Transformers increasingly dominate the machine learning landscape across many tasks and domains, which increases the importance for understanding their outputs. While their attention modules provide partial insight into their inner workings, the attention scores have been shown to be insufficient for explaining the models as a whole. To address this, we propose B-cos transformers, which inherently provide holistic explanations for their decisions. Specifically, we formulate each model component - such as the multi-layer perceptrons, attention layers, and the tokenisation module - to be dynamic linear, which allows us to faithfully summarise the entire transformer via a single linear transform. We apply our proposed design to Vision Transformers (ViTs) and show that the resulting models, dubbed Bcos-ViTs, are highly interpretable and perform competitively to baseline ViTs on ImageNet. Code will be made available soon.
Keyword: multimodal

Causal conditional hidden Markov model for multimodal traffic prediction
Authors: Yu Zhao, Pan Deng, Junting Liu, Xiaofeng Jia, Mulan Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2301.08249
Pdf link: https://arxiv.org/pdf/2301.08249
Abstract Multimodal traffic flow can reflect the health of the transportation system, and its prediction is crucial to urban traffic management. Recent works overemphasize spatio-temporal correlations of traffic flow, ignoring the physical concepts that lead to the generation of observations and their causal relationship. Spatio-temporal correlations are considered unstable under the influence of different conditions, and spurious correlations may exist in observations. In this paper, we analyze the physical concepts affecting the generation of multimode traffic flow from the perspective of the observation generation principle and propose a Causal Conditional Hidden Markov Model (CCHMM) to predict multimodal traffic flow. In the latent variables inference stage, a posterior network disentangles the causal representations of the concepts of interest from conditional information and observations, and a causal propagation module mines their causal relationship. In the data generation stage, a prior network samples the causal latent variables from the prior distribution and feeds them into the generator to generate multimodal traffic flow. We use a mutually supervised training method for the prior and posterior to enhance the identifiability of the model. Experiments on real-world datasets show that CCHMM can effectively disentangle causal representations of concepts of interest and identify causality, and accurately predict multimodal traffic flow.
A Big-Data Driven Framework to Estimating Vehicle Volume based on Mobile Device Location Data
Authors: Mofeng Yang, Weiyu Luo, Mohammad Ashoori, Jina Mahmoudi, Chenfeng Xiong, Jiawei Lu, Guangchen Zhao, Saeed Saleh Namedi, Songhua Hu, Aliakbar Kabiri
Subjects: Computers and Society (cs.CY); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2301.08660
Pdf link: https://arxiv.org/pdf/2301.08660
Abstract Vehicle volume serves as a critical metric and the fundamental basis for traffic signal control, transportation project prioritization, road maintenance plans and more. Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and loop detectors at a limited number of locations. These efforts require significant labor and cost for expansions. Researchers and private sector companies have also explored alternative solutions such as probe vehicle data, while still suffering from a low penetration rate. In recent years, along with the technological advancement in mobile sensors and mobile networks, Mobile Device Location Data (MDLD) have been growing dramatically in terms of the spatiotemporal coverage of the population and its mobility. This paper presents a big-data driven framework that can ingest terabytes of MDLD and estimate vehicle volume at a larger geographical area with a larger sample size. The proposed framework first employs a series of cloud-based computational algorithms to extract multimodal trajectories and trip rosters. A scalable map matching and routing algorithm is then applied to snap and route vehicle trajectories to the roadway network. The observed vehicle counts on each roadway segment are weighted and calibrated against ground truth control totals, i.e., Annual Vehicle-Miles of Travel (AVMT), and Annual Average Daily Traffic (AADT). The proposed framework is implemented on the all-street network in the state of Maryland using MDLD for the entire year of 2019. Results indicate that our proposed framework produces reliable vehicle volume estimates and also demonstrate its transferability and the generalization ability.
Keyword: CLIP

There is no result

Keyword: DALLE

There is no result

kobiso / daily-arxiv-noti

New submissions for Mon, 23 Jan 23 #649

Keyword: metric learning

Keyword: image retrieval

Keyword: self-supervised

Spatial Steerability of GANs via Self-Supervision from Discriminator

A Semi-supervised Sensing Rate Learning based CMAB Scheme to Combat COVID-19 by Trustful Data Collection in the Crowd

Self-supervised learning for a nonlinear inverse problem with forward operator involving an unknown function arising in Photoacoustic Tomography

Keyword: vision transformer

Image Memorability Prediction with Vision Transformers

Holistically Explainable Vision Transformers

Keyword: multimodal

Causal conditional hidden Markov model for multimodal traffic prediction

A Big-Data Driven Framework to Estimating Vehicle Volume based on Mobile Device Location Data

Keyword: CLIP

Keyword: DALLE