tensorfreitas / Siamese-Networks-for-One-Shot-Learning

Implementation of Siamese Neural Networks for One-shot Image Recognition
605 stars 181 forks source link

Using with own dataset #3

Closed sabrinazuraimi closed 5 years ago

sabrinazuraimi commented 6 years ago

Hi, I'm trying to run this with my own dataset (which only has characters and no alphabets).

I'm having trouble modifying this..

    # First let's take care of the train alphabets
    for alphabet in os.listdir(train_path):
        alphabet_path = os.path.join(train_path, alphabet)

        current_alphabet_dictionary = {}

        for character in os.listdir(alphabet_path):
            character_path = os.path.join(alphabet_path, character)

            current_alphabet_dictionary[character] = os.listdir(
                character_path)

        self.train_dictionary[alphabet] = current_alphabet_dictionary

I've tried changing this to

    for alphabet in os.listdir(train_path):
        alphabet_path = os.path.join(train_path, alphabet)

        current_alphabet_dictionary = {}

        for character in os.listdir(alphabet_path):
            character_path = os.path.join(alphabet_path, character)

            current_alphabet_dictionary[character] = character_path
        self.train_dictionary[alphabet] = current_alphabet_dictionary

My character path is something like this(with the number '498' being the label)

character path isdata/train/raw/498/996.png

I'm getting a type error saying that

TypeError: list indices must be integers or slices, not str

This is because the dictionary is supposed to have keys and values, but what if I don't have any key(alphabets) to work with, and only have values(characters)?

sabrinazuraimi commented 6 years ago

Or should I try pre-training this on the omniglot dataset and use it to do one-shot task on my images(I just need the network to be able to tell whether the images are the same or not). The images that I want to do one-shot task on only consists of pictures of words.

tensorfreitas commented 6 years ago

So I guess what you intend to do is to evaluate the performance in another dataset with characters. In that case you are trying to do something similar to the authors, where they trained on omniglot and tested on MNIST datset.

The code is built around the omniglot alphabets, so in order to use it to your own dataset you need to change it a little bit. You'll probably need a trained model and then adapt the one_shot_test code for your dataset (see the last lines of train_siamese_network.py).

Taking into acount that you have pictures of words instead of characters, I can't say how performance will be... So maybe you'll need to have to train on a similar way to the alphabets if it does not perform as good as you need.

Amanpradhan commented 6 years ago

Can the same model also be used to do face classification? If yes, then when I am trying to do it on my own dataset, why is it showing list index out of range error?

tensorfreitas commented 6 years ago

It can be used for face verification yes, but you would need to train your own model on a different dataset. You would need to create your own dataset loader for that.

Without seeing your code I cannot directly tell you what is going wrong. but I suspect you are passing the labels in a different format to the code. But I would need you to explain a little bit what you are trying to do with the code and what did you adapted for your work case.

buaa-luzhi commented 5 years ago

@Goldesel23 Hello, thank you for your contribution. The accuracy of Omniglot data set is more than 80%. However, on my own video data set, the maximum is not more than 60%. My data set has a very simple background and no noise. I don't know why the test results are so low. Can you give me some advice. Thanks so much.

tensorfreitas commented 5 years ago

Hi! Can you give more information on what kind of dataset you are using??

Best Regards, Tiago Freitas

buaa-luzhi commented 5 years ago

This is very clean video data. Therefore, 3D convolution is used in feature extraction network.

tensorfreitas commented 5 years ago

I Never used this kind of architecture in video domain. It will always depend on the amounts of data and the difficulty of the problem. Notice that in order for it to work you may need some architecture optimization, since the network described in the paper is optimized to omniglot.

Best of luck on your experiments.

buaa-luzhi commented 5 years ago

@Goldesel23 Thanks!