Authors: Shengzhong Liu, Tomoyoshi Kimura, Dongxin Liu, Ruijie Wang, Jinyang Li, Suhas Diggavi, Mani Srivastava, Tarek Abdelzaher
Arxiv Link [pdf]
This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately. FOCAL solves these challenges by making the following contributions: First, given multimodal time series, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective. In contrast, the private space extracts modality-exclusive information through a transformation-invariant objective. Second, we propose a temporal structural constraint for modality features, such that the average distance between temporally neighboring samples is no larger than that of temporally distant samples. Extensive evaluations are performed on four multimodal sensing datasets with two backbone encoders and two classifiers to demonstrate the superiority of FOCAL. It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin, under different ratios of available labels.
Dependencies: conda create -n [NAME] python=3.10; conda activate [NAME]
Installation:
git clone [repo] [dir]
cd [dir]
pip install -r requirements.txt
MOD is a self-collected dataset using acoustic (8000Hz) and seismic (100Hz) signals to classify moving vehicle types. The pretrain dataset includes lots of 10 classes, and the downstream tasks include vehicle classification, distance classification, and speed classification.
src/data_preprocess/MOD/proprocessing_configs.py
cd src/data_preprocess/MOD/
src/data/*.yaml
python extract_pretrain_samples.py
python extract_samples.py
python extract_samples_speed_distance.py
python partition_data_pretrain.py
python partition_data.py
python partition_data_speed_distance
Please download the dataset from the provided links in the Dataset section. The arguments to run the training program can be found by running
python train.py --help
The following command will run supervised training for DeepSense (-model=DeepSense
) or Transformer (-model=SW_Transformer
).
python train.py -model=[MODEL] -dataset=[DATASET]
python train.py -model=[MODEL] -dataset=[DATASET] -learn_framework=FOCAL
python train.py -model=[MODEL] -dataset=[DATASET] -learn_framework=FOCAL -task=[TASK] -stage=finetune -model_weight=[PATH TO MODEL WEIGHT]
weights/[DATASET]_[MODEL]/
directory.python test.py -model=[MODEL] -dataset[DATASET] -model_weight=[PATH TO MODEL WEIGHT]
python test.py -model=[MODEL] -dataset[DATASET] -stage=finetune -task=[TASK] -model_weight=[PATH TO MODEL WEIGHT]
See the file src/data/*.yaml
for specific model configurations for each dataset.
This project is released under the MIT license. See LICENSE
for details.
@inproceedings{liu2023focal,
title = {FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space},
author = {Liu, Shengzhong and Kimura, Tomoyoshi and Liu, Dongxin and Wang, Ruijie and Li, Jinyang and Diggavi, Suhas and Srivastava, Mani and Abdelzaher, Tarek},
booktitle = {Advances in Neural Information Processing Systems},
year = {2023}
}
For any issue regarding the paper or the code, please contact us.