Closed sabrinazuraimi closed 5 years ago
Or should I try pre-training this on the omniglot dataset and use it to do one-shot task on my images(I just need the network to be able to tell whether the images are the same or not). The images that I want to do one-shot task on only consists of pictures of words.
So I guess what you intend to do is to evaluate the performance in another dataset with characters. In that case you are trying to do something similar to the authors, where they trained on omniglot and tested on MNIST datset.
The code is built around the omniglot alphabets, so in order to use it to your own dataset you need to change it a little bit. You'll probably need a trained model and then adapt the one_shot_test
code for your dataset (see the last lines of train_siamese_network.py).
Taking into acount that you have pictures of words instead of characters, I can't say how performance will be... So maybe you'll need to have to train on a similar way to the alphabets if it does not perform as good as you need.
Can the same model also be used to do face classification? If yes, then when I am trying to do it on my own dataset, why is it showing list index out of range error?
It can be used for face verification yes, but you would need to train your own model on a different dataset. You would need to create your own dataset loader for that.
Without seeing your code I cannot directly tell you what is going wrong. but I suspect you are passing the labels in a different format to the code. But I would need you to explain a little bit what you are trying to do with the code and what did you adapted for your work case.
@Goldesel23 Hello, thank you for your contribution. The accuracy of Omniglot data set is more than 80%. However, on my own video data set, the maximum is not more than 60%. My data set has a very simple background and no noise. I don't know why the test results are so low. Can you give me some advice. Thanks so much.
Hi! Can you give more information on what kind of dataset you are using??
Best Regards, Tiago Freitas
This is very clean video data. Therefore, 3D convolution is used in feature extraction network.
I Never used this kind of architecture in video domain. It will always depend on the amounts of data and the difficulty of the problem. Notice that in order for it to work you may need some architecture optimization, since the network described in the paper is optimized to omniglot.
Best of luck on your experiments.
@Goldesel23 Thanks!
Hi, I'm trying to run this with my own dataset (which only has characters and no alphabets).
I'm having trouble modifying this..
I've tried changing this to
My character path is something like this(with the number '498' being the label)
character path isdata/train/raw/498/996.png
I'm getting a type error saying that
TypeError: list indices must be integers or slices, not str
This is because the dictionary is supposed to have keys and values, but what if I don't have any key(alphabets) to work with, and only have values(characters)?