Could not find the codes that handles missing modality

IcurasLW commented 4 months ago

No one in the issues can find the code to handle the missing modality. In the proposed scripts, full modality is available in the test data. All the results are not reproducible.

IcurasLW commented 4 months ago

In addition

The provided repository trained the encoders of both modalities in full modality setting. When it comes to the modality fusion, it loads the pre-trained encoder weights (trained on FULL) in the missing modality setting under the same dataset. THIS IS A CHEATING. The encoder has seen all available data in the pre-train phrase.

mengmenm commented 4 months ago

Hi IcurasLW,

Thanks for your question. The sound encoder is trained using only PARTIAL data, not the full dataset.

The AV-MNIST dataset contains 1,500 samples across 10 classes (1,050 for training and 450 for testing). We use the parameter "per_class_num" to control the number of samples used for training. For example, "per_class_num=21" means that 21 samples per class are used, totaling 210 samples (20% of the training samples). In our experiment, we assume the image data is complete (per_class_num=105) while the sound data is incomplete (e.g., per_class_num=21).

Please leave a comment if you have any further questions.

mengmenm / SMIL

Could not find the codes that handles missing modality #14