yxgeee / OpenIBL

[ECCV-2020 (spotlight)] Self-supervising Fine-grained Region Similarities for Large-scale Image Localization. 🌏 PyTorch open-source toolbox for image-based localization (place recognition).
https://yxgeee.github.io/projects/sfrs
MIT License
271 stars 41 forks source link

Training on custom dataset #1

Closed Zumbalamambo closed 3 years ago

Zumbalamambo commented 4 years ago

How do I load a single image and extract the descriptor?

yxgeee commented 4 years ago

Hi, I have updated the code and added a quite simple way to extract descriptors. Please refer to https://github.com/yxgeee/OpenIBL#extract-descriptor-for-a-single-image

Zumbalamambo commented 4 years ago

@yxgeee nice... May I know how to train as well on a custom dataset? I have got rgb image with pose information.

Zumbalamambo commented 4 years ago

@yxgeee It throws the following error,

ValueError: Unknown resampling filter (640). Use Image.NEAREST (0), Image.LANCZOS (1), Image.BILINEAR (2), Image.BICUBIC (3), Image.BOX (4) or Image.HAMMING (5)

yxgeee commented 4 years ago

Modify transforms.Resize(480, 640) to transforms.Resize((480, 640))

yxgeee commented 4 years ago

@yxgeee nice... May I know how to train as well on a custom dataset? I have got rgb image with pose information.

To train on a custom dataset, you need to write a dataset file following https://github.com/yxgeee/OpenIBL/blob/master/ibl/datasets/pitts.py. The key is to generate two json files, meta.json and splits.json. Please refer to https://drive.google.com/drive/folders/1ZFMUW0BAcdi_vp88K4ZqrDcQGDH3da5v?usp=sharing for an example of generated json files.

Zumbalamambo commented 4 years ago

@yxgeee May I know what is mean by utm in meta.json as well as what are those numbers in splits.json?

yxgeee commented 4 years ago

Oops, sorry for pressing the wrong button..

It's indeed hard to understand without comments. I cannot even remember it now as I wrote it half a year ago. I will add some comments regarding the dataset or prepare a template to write a custom dataset later.

Zumbalamambo commented 4 years ago

Thank you!... I'm waiting for it!..

Zumbalamambo commented 4 years ago

@yxgeee it's better to have a template for custom dataset since Pittsburg dataset is not available. I tried my best to gain access to it :( v

At the moment I have sequence of frames and a csv file containing imagename,x,y,z positions

yxgeee commented 4 years ago

Hello, please refer to https://github.com/yxgeee/OpenIBL/blob/master/docs/INSTALL.md#use-custom-dataset-optional for creating a custom dataset.

Zumbalamambo commented 4 years ago

@yxgeee thank you!.. I have followed your guideline to load the dataset. I have doubt on the values that must go inside q_train_pids, db_train_pids,q_val_pids ,db_val_pids,q_test_pids,db_test_pids.

I have created a dummy dataset with very few images to start with and used the same data for both validation and testing as well. This is my code at the moment.

import os.path as osp

from ..utils.data import Dataset
from ..utils.serialization import write_json
from ..utils.dist_utils import synchronize

class MyDataset(Dataset):

    def __init__(self, root, scale=None, verbose=True):
        super(MyDataset, self).__init__(root)

        self.arrange()
        self.load(verbose)

    def arrange(self):
        if self._check_integrity():
            return

        try:
            rank = dist.get_rank()
        except:
            rank = 0

        # the root path for raw dataset
        raw_dir = osp.join(self.root, 'raw')
        if (not osp.isdir(raw_dir)):
            raise RuntimeError("Dataset not found.")

        identities = [["examples/data/my_dataset/query/IMG_7201.JPG","examples/data/my_dataset/raw/IMG_7201.JPG"],
                      ["examples/data/my_dataset/query/IMG_7207.JPG","examples/data/my_dataset/raw/IMG_7207.JPG"],
                      ["examples/data/my_dataset/query/IMG_7208.JPG","examples/data/my_dataset/raw/IMG_7208.JPG"],
                      ["examples/data/my_dataset/query/IMG_7209.JPG","examples/data/my_dataset/raw/IMG_7209.JPG"],
                      ["examples/data/my_dataset/query/IMG_7210.JPG","examples/data/my_dataset/raw/IMG_7210.JPG"]]

        utms = [[1.294619, 0.885227],
                [-0.409010, -0.449514],
                [-0.109162, 0.164040],
                [0.094267, 0.795477],
                [0.351835, 1.336169]
                ]
        # Save meta information into a json file
        meta = {
                'name': 'demo', # change it to your dataset name
                'identities': identities,
                'utm': utms
                }

        if rank == 0:
            write_json(meta, osp.join(self.root, 'meta.json'))

        q_train_pids = [i for i in range(len(identities))]
        db_train_pids = [i for i in range(len(identities))]
        q_val_pids = [i for i in range(len(identities))]
        db_val_pids = [i for i in range(len(identities))]
        q_test_pids = [i for i in range(len(identities))]

        # Save the training / test / val split into a json file
        splits = {
            'q_train': sorted(q_train_pids),
            'db_train': sorted(db_train_pids),
            'q_val': sorted(q_val_pids),
            'db_val': sorted(db_val_pids),
            'q_test': sorted(q_test_pids),
            'db_test': sorted(q_test_pids)}

        if rank == 0:
            write_json(splits, osp.join(self.root, 'splits.json'))

        synchronize()

It throws the following error :(,

/home/anaconda3//bin/python /home/workspace/OpenIBL/train.py
Traceback (most recent call last):
  File "/home/workspace/OpenIBL/train.py", line 2, in <module>
    dataset = create('my_dataset', 'examples/data/my_dataset')
  File "/home/workspace/OpenIBL/ibl/datasets/__init__.py", line 33, in create
    return __factory[name](root, *args, **kwargs)
  File "/home/workspace/OpenIBL/ibl/datasets/my_dataset.py", line 15, in __init__
    self.load(verbose)
  File "/home/workspace/OpenIBL/ibl/utils/data/dataset.py", line 75, in load
    self.q_train = _pluck(identities, utm, q_train_pids, relabel=False)
  File "/home/workspace/OpenIBL/ibl/utils/data/dataset.py", line 14, in _pluck
    pid_images = identities[pid]
IndexError: list index out of range
yxgeee commented 4 years ago

I have tried your dummy dataset, and it works Screenshot 2020-09-03_17-14-48-953

Did you modify anything that leads to the error? Try to refresh the code by pulling the repo again.

Plus, I found that you use "examples/data/my_dataset/raw/IMG_7201.JPG" in identities, you should use "IMG_7201.JPG" instead. Note that every image needs to be save under "examples/data/my_dataset/raw".

yxgeee commented 4 years ago

You have 5 sublists in identities, but 6 indices. So it would raise the error list index out of range. The indices are expected to be [0,1,2,3,4] in your case. You need to double-check why this problem occurs.

Zumbalamambo commented 4 years ago

@yxgeee I have spotted the error. Since the meta.json and splits.json exists already, it was not being overwritten. I have just deleted those two files and rerun the loader, It works now... Let me try the training pipeline :)

yxgeee commented 4 years ago

@yxgeee should I create vgg16_pitts_64_desc_cen for my dataset as well? I just changed the "dataset" to "dummy" and it throws and error stating that Dataset not found

Generally speaking, you need to create vgg16_pitts_64_desc_cen for your dataset. However, I am not sure whether your own vgg16_pitts_64_desc_cen would perform better than the original vgg16_pitts_64_desc_cen, although the original vgg16_pitts_64_desc_cen was generated based on Pitts dataset. You could try them both and compare the performance.

Did you register the dataset as mentioned in step 2 (https://github.com/yxgeee/OpenIBL/blob/master/docs/INSTALL.md#use-custom-dataset-optional)? If you used python setup.py install to install the library, you need to set up the library again once upon you change the files in ibl/.

Zumbalamambo commented 4 years ago

May I know what is the use of "vgg16_pitts_64_desc_cen". I suppose it has clustered centroids of features.

I have the following error when I ran ,sh train_baseline_dist.sh triplet


===> Start calculating pairwise distances
===> Start sorting gallery
Traceback (most recent call last):
  File "examples/netvlad_img.py", line 294, in <module>
    main()
  File "examples/netvlad_img.py", line 114, in main
    main_worker(args)
  File "examples/netvlad_img.py", line 188, in main_worker
    vlad=args.vlad, loss_type=args.loss_type)
  File "/home/workspace/OpenIBL/ibl/trainers.py", line 33, in train
    data_loader.new_epoch()
  File "/home/workspace/OpenIBL/ibl/utils/data/__init__.py", line 20, in new_epoch
    self.iter = iter(self.loader)
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 746, in __init__
    self._try_put_index()
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index
    index = self._next_index()
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 200, in __iter__
    for idx in self.sampler:
  File "/home/workspace/OpenIBL/ibl/utils/data/sampler.py", line 85, in __iter__
    assert(len(neg_indices)==self.neg_num)
AssertionError
Traceback (most recent call last):
  File "/home/anaconda3/envs/odom/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/anaconda3/envs/odom/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/anaconda3/envs/odom/bin/python', '-u', 'examples/netvlad_img.py', '--launcher', 'pytorch', '--tcp-port', '10000', '-d', 'dummy', '--scale', '30k', '-a', 'vgg16', '--layers', 'conv5', '--vlad', '--syncbn', '--sync-gather', '--width', '640', '--height', '480', '--tuple-size', '1', '-j', '1', '--neg-num', '1', '--test-batch-size', '1', '--margin', '0.1', '--lr', '0.001', '--weight-decay', '0.001', '--loss-type', 'triplet', '--eval-step', '1', '--epochs', '5', '--step-size', '1', '--cache-size', '1000', '--logs-dir', 'logs/netVLAD/dummy30k-vgg16/conv5-triplet-lr0.001-tuple1']' returned non-zero exit status 1.

I use single GPU by the way

Zumbalamambo commented 4 years ago

@yxgeee i tried to print print(len(neg_indices),self.neg_num) in sampler.py. I'm getting 0,2

yxgeee commented 4 years ago

len(neg_indices)=0 is abnormal, and it seems no negative sample is found. I guess the problem is still on the dataset. Did you use [abscissa, ordinate] in utms as the coordinates? In https://github.com/yxgeee/OpenIBL/blob/master/ibl/utils/data/dataset.py#L43, we use 10m as the positive distance threshold and 25m as the negative distance threshold. If all the samples in your dataset are within 25m, no negative pair can be found. If this is the problem, you need to modify the intra_thres and inter_thres values.

Zumbalamambo commented 4 years ago

@yxgeee I have loaded almost 50k images with pose information. when I try to train, my system freezes. Following is the setting that I use,


Use GPU: 0 for training, rank no.0 of world_size 1

Args:Namespace(arch='vgg16', cache_size=1000, data_dir='/home/workspace/OpenIBL/examples/data', dataset='dummy', deterministic=False, epochs=5, eval_step=1, features=4096, gpu=0, height=480, init_dir='/home/workspace/OpenIBL/examples/../logs', iters=0, launcher='pytorch', layers='conv5', logs_dir='logs/netVLAD/dummy30k-vgg16/conv5-triplet-lr0.001-tuple1', loss_type='triplet', lr=0.001, margin=0.1, momentum=0.9, neg_num=10, neg_pool=1000, ngpus_per_node=1, nowhiten=False, num_clusters=64, print_freq=10, rank=0, rerank=False, resume='', scale='30k', seed=43, step_size=5, sync_gather=True, syncbn=True, tcp_port='10000', test_batch_size=1, tuple_size=1, vlad=True, weight_decay=0.001, width=640, workers=1, world_size=1)
yxgeee commented 4 years ago

Could you provide your training command?

Zumbalamambo commented 4 years ago

i ran ./train_baseline_dist.sh triplet command.

Since I can't upload the file of type .sh and .py, I'm pasting the content of my sh file

#!/bin/sh
PYTHON=${PYTHON:-"python"}
GPUS=1

DATASET=dummy
SCALE=30k
ARCH=vgg16
LAYERS=conv5
LOSS=$1
LR=0.001

if [ $# -ne 1 ]
  then
    echo "Arguments error: <LOSS_TYPE (triplet|sare_ind|sare_joint)>"
    exit 1
fi

PORT=$(( ((RANDOM<<15)|RANDOM) % 49152 + 10000 ))
status="$(nc -z 127.0.0.1 $PORT < /dev/null &>/dev/null; echo $?)"
echo $PORT
if [ "${status}" != "0" ]; then
    break;
fi

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT --use_env \
examples/netvlad_img.py --launcher pytorch --tcp-port ${PORT} \
  -d ${DATASET} --scale ${SCALE} \
  -a ${ARCH} --layers ${LAYERS} --vlad --syncbn --sync-gather \
  --width 640 --height 480 --tuple-size 1 -j 1 --test-batch-size 1 \
  --margin 0.1 --lr ${LR} --weight-decay 0.001 --loss-type ${LOSS} \
  --eval-step 1 --epochs 5 --step-size 5 --cache-size 200 \
  --logs-dir logs/netVLAD/${DATASET}${SCALE}-${ARCH}/${LAYERS}-${LOSS}-lr${LR}-tuple${GPUS}

Setting in netvlad_img.py

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="NetVLAD/SARE training")
    parser.add_argument('--launcher', type=str,
                        choices=['none', 'pytorch', 'slurm'],
                        default='none', help='job launcher')
    parser.add_argument('--tcp-port', type=str, default='5017')
    # data
    parser.add_argument('-d', '--dataset', type=str, default='pitts',
                        choices=datasets.names())
    parser.add_argument('--scale', type=str, default='30k')
    parser.add_argument('--tuple-size', type=int, default=1,
                        help="tuple numbers in a batch")
    parser.add_argument('--test-batch-size', type=int, default=1,
                        help="tuple numbers in a batch")
    parser.add_argument('--cache-size', type=int, default=200)
    parser.add_argument('-j', '--workers', type=int, default=1)
    parser.add_argument('--height', type=int, default=480, help="input height")
    parser.add_argument('--width', type=int, default=640, help="input width")
    parser.add_argument('--neg-num', type=int, default=2,
                        help="negative instances for one anchor in a tuple")
    parser.add_argument('--num-clusters', type=int, default=64)
    parser.add_argument('--neg-pool', type=int, default=200)
    # model
    parser.add_argument('-a', '--arch', type=str, default='vgg16',
                        choices=models.names())
    parser.add_argument('--layers', type=str, default='conv5')
    parser.add_argument('--nowhiten', action='store_true')
    parser.add_argument('--syncbn', action='store_true')
    parser.add_argument('--sync-gather', action='store_true')
    parser.add_argument('--features', type=int, default=4096)
    # optimizer
    parser.add_argument('--lr', type=float, default=0.001,
                        help="learning rate of new parameters, for pretrained ")
    parser.add_argument('--momentum', type=float, default=0.9)
    parser.add_argument('--weight-decay', type=float, default=0.001)
    parser.add_argument('--loss-type', type=str, default='triplet', help="[triplet|sare_ind|sare_joint]")
    parser.add_argument('--step-size', type=int, default=5)
    # training configs
    parser.add_argument('--resume', type=str, default='', metavar='PATH')
    parser.add_argument('--vlad', action='store_true')
    parser.add_argument('--eval-step', type=int, default=1)
    parser.add_argument('--rerank', action='store_true',
                        help="evaluation only")
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--iters', type=int, default=0)
    parser.add_argument('--seed', type=int, default=43)
    parser.add_argument('--deterministic', action='store_true')
    parser.add_argument('--print-freq', type=int, default=10)
    parser.add_argument('--margin', type=float, default=0.1, help='margin for the triplet loss with batch hard')
    # path
    working_dir = osp.dirname(osp.abspath(__file__))
    parser.add_argument('--data-dir', type=str, metavar='PATH',
                        default=osp.join(working_dir, 'data'))
    parser.add_argument('--logs-dir', type=str, metavar='PATH',
                        default=osp.join(working_dir, 'logs'))
    parser.add_argument('--init-dir', type=str, metavar='PATH',
                        default=osp.join(working_dir, '..', 'logs'))
    main()