I want to train with my own dataset

anewusername77 commented 3 years ago

its image datset without labels, should i create it like imagenet-style datasets? i mean images of different labels in different folders

anewusername77 commented 3 years ago

but i don't have labels

wvangansbeke commented 3 years ago

Hi @scarletteshu,

Thank you for your interest.

Yes. You need to write your own dataset (e.g. data/cifar.py). Please refer to the following issues: #8, #19, #34. They might be useful. Also, since you don't have labels available, you will have to remove the evaluation code.

anewusername77 commented 3 years ago

Hi @scarletteshu,

Thank you for your interest.

Yes. You need to write your own dataset (e.g. data/cifar.py). Please refer to the following issues: #8, #19, #34. They might be useful. Also, since you don't have labels available, you will have to remove the evaluation code.

thanks a lot! i'm new to this, i'll ask you again if i got any more problems. thanks again~

anewusername77 commented 3 years ago

dear author, my new questions are as follows:

question one: in dataset file, such as cifar.py, if i change self.targets=[]and self.classes=[] to a constant value(targets=[[0],[0],...], self.classes=['01', '02', ...]), will it influences the training? since the running code need these values but i don't have ground truth labels, can't just remove them. can i just remain evaluation part? since evalutation should not change model states and final results

question two: when i remove evaluation code: in moco.py:

# Mine the topk nearest neighbors (Validation)
# These will be used for validation.
'''
topk = 5
print(colored('Mine the nearest neighbors (Val)(Top-%d)' %(topk), 'blue'))
fill_memory_bank(val_dataloader, model, memory_bank_val)
print('Mine the neighbors')
indices, acc = memory_bank_val.mine_nearest_neighbors(topk)
print('Accuracy of top-%d nearest neighbors on val set is %.2f' %(topk, 100*acc))
np.save(p['topk_neighbors_val_path'], indices)
'''

then there will be no topk_neighbors_val file but in scan.py:

# Evaluate 
print('Make prediction on validation set ...')
predictions = get_predictions(p, val_dataloader, model)

print('Evaluate based on SCAN loss ...')
scan_stats = scan_evaluate(predictions)
print(scan_stats)
lowest_loss_head = scan_stats['lowest_loss_head']
lowest_loss = scan_stats['lowest_loss']

if lowest_loss < best_loss:
        print('New lowest loss on validation set: %.4f -> %.4f' %(best_loss, lowest_loss))
        print('Lowest loss head is %d' %(lowest_loss_head))
        best_loss = lowest_loss
        best_loss_head = lowest_loss_head
        torch.save({'model': model.module.state_dict(), 'head': best_loss_head}, p['scan_model'])

else:
        print('No new lowest loss on validation set: %.4f -> %.4f' %(best_loss, lowest_loss))
        print('Lowest loss head is %d' %(best_loss_head))

print('Evaluate with hungarian matching algorithm ...')
clustering_stats = hungarian_evaluate(lowest_loss_head, predictions, compute_confusion_matrix=False)
print(clustering_stats)

there is torch.save()in evaluate part. if i remove them in scan.py, will it influence the saving model？also, if not, scan.py will raise error "cannot find topk_neighbors_val file "

expecting your response~(sorry to have so many questions)

wvangansbeke commented 3 years ago

Hi @scarletteshu,

Yes, you will have to modify the code. If you don't have labels, you can't compute the accuracy. You can remove that part. The validation loss is used to select the best model. You can define your own validation set or take the final model.

anewusername77 commented 3 years ago

thanks for your reply, when I trained cifar10, losses were like consistency loss 8.5809e-01 entropy 2.3005e+00 but when I trained my own dataset, consistency loss was always close to entropy, and predctions['probabilities'] were close to each other (such as 0,1001, 0,1012,...), what do you think the problem is? I only changed transforms as ours and learning rate in config file, comparing to scan_imagenet_50.yml

wvangansbeke commented 3 years ago

Hi @scarletteshu,

Hard to say what the problem is exactly. Especially since I don't know the dataset. However, lowering the weight in the loss will likely help.

wvangansbeke commented 3 years ago

If there are still issues let me know. Closing this for now.

wvangansbeke / Unsupervised-Classification

I want to train with my own dataset #64