Closed kalenforn closed 1 year ago
Thank you for your attention. This setting is commonly used in unsupervised cross-modal hashing, e.g., [13]. It is also more applicable in real-world applications since unlabeled data tends to be more readily available, and we can use the retrieval set as the training set. It is important to note, however, that the query set should remain unseen during testing. Additionally, this setting cannot produce overfitting problems, which may often arise from small data or flawed training strategies and models.
[13] Li C, Deng C, Wang L, et al. Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval[C]//Proceedings of the AAAI conference on artificial intelligence. 2019, 33(01): 176-183.
I think the training data is randomly selected in the retrieval data, but they are the same in your training code.
This setting is commonly used in unsupervised cross-modal hashing, e.g., [13]. In unsupervised cross-modal retrieval, the retrieval set is often used as the training set.
imgs, tags, labels = imgs[inx], tags[inx], labels[inx]
test_size = 2000
if 'test' in partition.lower():
imgs, tags, labels = imgs[-test_size::], tags[-test_size::], labels[-test_size::]
else:
imgs, tags, labels = imgs[0: -test_size], tags[0: -test_size], labels[0: -test_size]
return imgs.transpose([0, 3, 2, 1]), tags, labels, root
here is your data split method, but where is the training set? Your paper is written as "we randomly select 5,000 pairs from the retrieval database as their training set.", while there isn't a training set selecting strategy in your code. Do you forget to provide it?
Hi,
This is for the supervised baselines. Thus, it is not in our method. Thanks.
Best regards, Peng Hu
So the Datasets config in the UCCH paper section 4.1 is mainly for supervised methods. UCCH train dataset is just the same as retrieval dataset. Do I get the point?
font{
line-height: 1.6;
}
ul,ol{
padding-left: 20px;
list-style-position: inside;
}
Yes. We have stated the configuration for unsupervised and supervised methods in the section.Best,Peng
penghu.ml
***@***.***
---- Replied Message ----
From
Youguang ***@***.***>
Date
01/23/2024 15:00
To
***@***.***>
Cc
***@***.***>
,
State ***@***.***>
Subject
Re: [penghu-cs/UCCH] data split mistake in your code. (Issue #5)
So the Datasets config in the UCCH paper section 4.1 is mainly for supervised methods. UCCH train dataset is just the same as retrieval dataset. Do I get the point? image.png (view on web)
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: @.***>
I noticed that there may be some issues with how the dataset is being split. In src/cmdataset.py, line 138 and subsequent 'else' statements, the training dataset may not be properly separated from the retrieval dataset. As a result, I found that the lengths of the train_dataset and retrieval_dataset were the same when I printed them in UCCH.py. This could potentially lead to the model overfitting due to the presence of prior information during training. I kindly request your attention to this matter and would greatly appreciate it if you could look into fixing this.