Closed Zumbalamambo closed 3 years ago
Hi, I have updated the code and added a quite simple way to extract descriptors. Please refer to https://github.com/yxgeee/OpenIBL#extract-descriptor-for-a-single-image
@yxgeee nice... May I know how to train as well on a custom dataset? I have got rgb image with pose information.
@yxgeee It throws the following error,
ValueError: Unknown resampling filter (640). Use Image.NEAREST (0), Image.LANCZOS (1), Image.BILINEAR (2), Image.BICUBIC (3), Image.BOX (4) or Image.HAMMING (5)
Modify transforms.Resize(480, 640)
to transforms.Resize((480, 640))
@yxgeee nice... May I know how to train as well on a custom dataset? I have got rgb image with pose information.
To train on a custom dataset, you need to write a dataset file following https://github.com/yxgeee/OpenIBL/blob/master/ibl/datasets/pitts.py. The key is to generate two json files, meta.json and splits.json. Please refer to https://drive.google.com/drive/folders/1ZFMUW0BAcdi_vp88K4ZqrDcQGDH3da5v?usp=sharing for an example of generated json files.
@yxgeee May I know what is mean by utm in meta.json as well as what are those numbers in splits.json?
Oops, sorry for pressing the wrong button..
It's indeed hard to understand without comments. I cannot even remember it now as I wrote it half a year ago. I will add some comments regarding the dataset or prepare a template to write a custom dataset later.
Thank you!... I'm waiting for it!..
@yxgeee it's better to have a template for custom dataset since Pittsburg dataset is not available. I tried my best to gain access to it :( v
At the moment I have sequence of frames and a csv file containing imagename,x,y,z
positions
Hello, please refer to https://github.com/yxgeee/OpenIBL/blob/master/docs/INSTALL.md#use-custom-dataset-optional for creating a custom dataset.
@yxgeee thank you!.. I have followed your guideline to load the dataset. I have doubt on the values that must go inside q_train_pids
, db_train_pids
,q_val_pids
,db_val_pids
,q_test_pids
,db_test_pids
.
I have created a dummy dataset with very few images to start with and used the same data for both validation and testing as well. This is my code at the moment.
import os.path as osp
from ..utils.data import Dataset
from ..utils.serialization import write_json
from ..utils.dist_utils import synchronize
class MyDataset(Dataset):
def __init__(self, root, scale=None, verbose=True):
super(MyDataset, self).__init__(root)
self.arrange()
self.load(verbose)
def arrange(self):
if self._check_integrity():
return
try:
rank = dist.get_rank()
except:
rank = 0
# the root path for raw dataset
raw_dir = osp.join(self.root, 'raw')
if (not osp.isdir(raw_dir)):
raise RuntimeError("Dataset not found.")
identities = [["examples/data/my_dataset/query/IMG_7201.JPG","examples/data/my_dataset/raw/IMG_7201.JPG"],
["examples/data/my_dataset/query/IMG_7207.JPG","examples/data/my_dataset/raw/IMG_7207.JPG"],
["examples/data/my_dataset/query/IMG_7208.JPG","examples/data/my_dataset/raw/IMG_7208.JPG"],
["examples/data/my_dataset/query/IMG_7209.JPG","examples/data/my_dataset/raw/IMG_7209.JPG"],
["examples/data/my_dataset/query/IMG_7210.JPG","examples/data/my_dataset/raw/IMG_7210.JPG"]]
utms = [[1.294619, 0.885227],
[-0.409010, -0.449514],
[-0.109162, 0.164040],
[0.094267, 0.795477],
[0.351835, 1.336169]
]
# Save meta information into a json file
meta = {
'name': 'demo', # change it to your dataset name
'identities': identities,
'utm': utms
}
if rank == 0:
write_json(meta, osp.join(self.root, 'meta.json'))
q_train_pids = [i for i in range(len(identities))]
db_train_pids = [i for i in range(len(identities))]
q_val_pids = [i for i in range(len(identities))]
db_val_pids = [i for i in range(len(identities))]
q_test_pids = [i for i in range(len(identities))]
# Save the training / test / val split into a json file
splits = {
'q_train': sorted(q_train_pids),
'db_train': sorted(db_train_pids),
'q_val': sorted(q_val_pids),
'db_val': sorted(db_val_pids),
'q_test': sorted(q_test_pids),
'db_test': sorted(q_test_pids)}
if rank == 0:
write_json(splits, osp.join(self.root, 'splits.json'))
synchronize()
It throws the following error :(,
/home/anaconda3//bin/python /home/workspace/OpenIBL/train.py
Traceback (most recent call last):
File "/home/workspace/OpenIBL/train.py", line 2, in <module>
dataset = create('my_dataset', 'examples/data/my_dataset')
File "/home/workspace/OpenIBL/ibl/datasets/__init__.py", line 33, in create
return __factory[name](root, *args, **kwargs)
File "/home/workspace/OpenIBL/ibl/datasets/my_dataset.py", line 15, in __init__
self.load(verbose)
File "/home/workspace/OpenIBL/ibl/utils/data/dataset.py", line 75, in load
self.q_train = _pluck(identities, utm, q_train_pids, relabel=False)
File "/home/workspace/OpenIBL/ibl/utils/data/dataset.py", line 14, in _pluck
pid_images = identities[pid]
IndexError: list index out of range
I have tried your dummy dataset, and it works
Did you modify anything that leads to the error? Try to refresh the code by pulling the repo again.
Plus, I found that you use "examples/data/my_dataset/raw/IMG_7201.JPG" in identities
, you should use "IMG_7201.JPG" instead. Note that every image needs to be save under "examples/data/my_dataset/raw".
You have 5 sublists in identities, but 6 indices. So it would raise the error list index out of range
.
The indices are expected to be [0,1,2,3,4]
in your case. You need to double-check why this problem occurs.
@yxgeee I have spotted the error. Since the meta.json
and splits.json
exists already, it was not being overwritten. I have just deleted those two files and rerun the loader, It works now... Let me try the training pipeline :)
@yxgeee should I create
vgg16_pitts_64_desc_cen
for my dataset as well? I just changed the "dataset" to "dummy" and it throws and error stating thatDataset not found
Generally speaking, you need to create vgg16_pitts_64_desc_cen
for your dataset. However, I am not sure whether your own vgg16_pitts_64_desc_cen
would perform better than the original vgg16_pitts_64_desc_cen
, although the original vgg16_pitts_64_desc_cen
was generated based on Pitts dataset. You could try them both and compare the performance.
Did you register the dataset as mentioned in step 2 (https://github.com/yxgeee/OpenIBL/blob/master/docs/INSTALL.md#use-custom-dataset-optional)? If you used python setup.py install
to install the library, you need to set up the library again once upon you change the files in ibl/
.
May I know what is the use of "vgg16_pitts_64_desc_cen". I suppose it has clustered centroids of features.
I have the following error when I ran ,sh train_baseline_dist.sh triplet
===> Start calculating pairwise distances
===> Start sorting gallery
Traceback (most recent call last):
File "examples/netvlad_img.py", line 294, in <module>
main()
File "examples/netvlad_img.py", line 114, in main
main_worker(args)
File "examples/netvlad_img.py", line 188, in main_worker
vlad=args.vlad, loss_type=args.loss_type)
File "/home/workspace/OpenIBL/ibl/trainers.py", line 33, in train
data_loader.new_epoch()
File "/home/workspace/OpenIBL/ibl/utils/data/__init__.py", line 20, in new_epoch
self.iter = iter(self.loader)
File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 746, in __init__
self._try_put_index()
File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index
index = self._next_index()
File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 200, in __iter__
for idx in self.sampler:
File "/home/workspace/OpenIBL/ibl/utils/data/sampler.py", line 85, in __iter__
assert(len(neg_indices)==self.neg_num)
AssertionError
Traceback (most recent call last):
File "/home/anaconda3/envs/odom/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/anaconda3/envs/odom/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/anaconda3/envs/odom/bin/python', '-u', 'examples/netvlad_img.py', '--launcher', 'pytorch', '--tcp-port', '10000', '-d', 'dummy', '--scale', '30k', '-a', 'vgg16', '--layers', 'conv5', '--vlad', '--syncbn', '--sync-gather', '--width', '640', '--height', '480', '--tuple-size', '1', '-j', '1', '--neg-num', '1', '--test-batch-size', '1', '--margin', '0.1', '--lr', '0.001', '--weight-decay', '0.001', '--loss-type', 'triplet', '--eval-step', '1', '--epochs', '5', '--step-size', '1', '--cache-size', '1000', '--logs-dir', 'logs/netVLAD/dummy30k-vgg16/conv5-triplet-lr0.001-tuple1']' returned non-zero exit status 1.
I use single GPU by the way
@yxgeee i tried to print print(len(neg_indices),self.neg_num)
in sampler.py
. I'm getting 0,2
len(neg_indices)=0
is abnormal, and it seems no negative sample is found. I guess the problem is still on the dataset.
Did you use [abscissa, ordinate]
in utms
as the coordinates? In https://github.com/yxgeee/OpenIBL/blob/master/ibl/utils/data/dataset.py#L43, we use 10m as the positive distance threshold and 25m as the negative distance threshold. If all the samples in your dataset are within 25m, no negative pair can be found. If this is the problem, you need to modify the intra_thres
and inter_thres
values.
@yxgeee I have loaded almost 50k images with pose information. when I try to train, my system freezes. Following is the setting that I use,
Use GPU: 0 for training, rank no.0 of world_size 1
Args:Namespace(arch='vgg16', cache_size=1000, data_dir='/home/workspace/OpenIBL/examples/data', dataset='dummy', deterministic=False, epochs=5, eval_step=1, features=4096, gpu=0, height=480, init_dir='/home/workspace/OpenIBL/examples/../logs', iters=0, launcher='pytorch', layers='conv5', logs_dir='logs/netVLAD/dummy30k-vgg16/conv5-triplet-lr0.001-tuple1', loss_type='triplet', lr=0.001, margin=0.1, momentum=0.9, neg_num=10, neg_pool=1000, ngpus_per_node=1, nowhiten=False, num_clusters=64, print_freq=10, rank=0, rerank=False, resume='', scale='30k', seed=43, step_size=5, sync_gather=True, syncbn=True, tcp_port='10000', test_batch_size=1, tuple_size=1, vlad=True, weight_decay=0.001, width=640, workers=1, world_size=1)
Could you provide your training command?
i ran ./train_baseline_dist.sh triplet
command.
Since I can't upload the file of type .sh
and .py
, I'm pasting the content of my sh file
#!/bin/sh
PYTHON=${PYTHON:-"python"}
GPUS=1
DATASET=dummy
SCALE=30k
ARCH=vgg16
LAYERS=conv5
LOSS=$1
LR=0.001
if [ $# -ne 1 ]
then
echo "Arguments error: <LOSS_TYPE (triplet|sare_ind|sare_joint)>"
exit 1
fi
PORT=$(( ((RANDOM<<15)|RANDOM) % 49152 + 10000 ))
status="$(nc -z 127.0.0.1 $PORT < /dev/null &>/dev/null; echo $?)"
echo $PORT
if [ "${status}" != "0" ]; then
break;
fi
$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT --use_env \
examples/netvlad_img.py --launcher pytorch --tcp-port ${PORT} \
-d ${DATASET} --scale ${SCALE} \
-a ${ARCH} --layers ${LAYERS} --vlad --syncbn --sync-gather \
--width 640 --height 480 --tuple-size 1 -j 1 --test-batch-size 1 \
--margin 0.1 --lr ${LR} --weight-decay 0.001 --loss-type ${LOSS} \
--eval-step 1 --epochs 5 --step-size 5 --cache-size 200 \
--logs-dir logs/netVLAD/${DATASET}${SCALE}-${ARCH}/${LAYERS}-${LOSS}-lr${LR}-tuple${GPUS}
Setting in netvlad_img.py
if __name__ == '__main__':
parser = argparse.ArgumentParser(description="NetVLAD/SARE training")
parser.add_argument('--launcher', type=str,
choices=['none', 'pytorch', 'slurm'],
default='none', help='job launcher')
parser.add_argument('--tcp-port', type=str, default='5017')
# data
parser.add_argument('-d', '--dataset', type=str, default='pitts',
choices=datasets.names())
parser.add_argument('--scale', type=str, default='30k')
parser.add_argument('--tuple-size', type=int, default=1,
help="tuple numbers in a batch")
parser.add_argument('--test-batch-size', type=int, default=1,
help="tuple numbers in a batch")
parser.add_argument('--cache-size', type=int, default=200)
parser.add_argument('-j', '--workers', type=int, default=1)
parser.add_argument('--height', type=int, default=480, help="input height")
parser.add_argument('--width', type=int, default=640, help="input width")
parser.add_argument('--neg-num', type=int, default=2,
help="negative instances for one anchor in a tuple")
parser.add_argument('--num-clusters', type=int, default=64)
parser.add_argument('--neg-pool', type=int, default=200)
# model
parser.add_argument('-a', '--arch', type=str, default='vgg16',
choices=models.names())
parser.add_argument('--layers', type=str, default='conv5')
parser.add_argument('--nowhiten', action='store_true')
parser.add_argument('--syncbn', action='store_true')
parser.add_argument('--sync-gather', action='store_true')
parser.add_argument('--features', type=int, default=4096)
# optimizer
parser.add_argument('--lr', type=float, default=0.001,
help="learning rate of new parameters, for pretrained ")
parser.add_argument('--momentum', type=float, default=0.9)
parser.add_argument('--weight-decay', type=float, default=0.001)
parser.add_argument('--loss-type', type=str, default='triplet', help="[triplet|sare_ind|sare_joint]")
parser.add_argument('--step-size', type=int, default=5)
# training configs
parser.add_argument('--resume', type=str, default='', metavar='PATH')
parser.add_argument('--vlad', action='store_true')
parser.add_argument('--eval-step', type=int, default=1)
parser.add_argument('--rerank', action='store_true',
help="evaluation only")
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--iters', type=int, default=0)
parser.add_argument('--seed', type=int, default=43)
parser.add_argument('--deterministic', action='store_true')
parser.add_argument('--print-freq', type=int, default=10)
parser.add_argument('--margin', type=float, default=0.1, help='margin for the triplet loss with batch hard')
# path
working_dir = osp.dirname(osp.abspath(__file__))
parser.add_argument('--data-dir', type=str, metavar='PATH',
default=osp.join(working_dir, 'data'))
parser.add_argument('--logs-dir', type=str, metavar='PATH',
default=osp.join(working_dir, 'logs'))
parser.add_argument('--init-dir', type=str, metavar='PATH',
default=osp.join(working_dir, '..', 'logs'))
main()
How do I load a single image and extract the descriptor?