Understanding the results

silvaphl commented 3 years ago

Hi, Thanks for sharing the code!!!

I've run your code

python experiment.py with dataset.cars model.resnet50 epochs=100 lr=0.05 model.norm_layer=batch -F results/cars/

and I've getting the following result:

Validation [099] {'cosine': {1: 1.03, 2: 1.03, 4: 1.03, 8: 1.03, 16: 1.03, 32: 2.0}, 'l2': {1: 1.17, 2: 2.12, 4: 4.19, 8: 7.4, 16: 12.73, 32: 18.71}} INFO - Metric Learning - Result: 1.03 INFO - Metric Learning - Completed after 0:54:25

I do not know if I am doing something wrong, but how the result 1.03 is related to the 89.3 reported in your paper?

Am i doing something wrong?

Best regards.

jeromerony commented 3 years ago

Hi,

This is a weird result since the recall @ 1 should be 45.28 with the ImageNet pretrained ResNet-50 so it should not report lower results than this.

Can you give a bit more details on your setup? If possible, give the results of

python experiment.py print_config with dataset.cars model.resnet50 epochs=100 lr=0.05 model.norm_layer=batch

If you can also give the content of the run.json file in the result directory that would help.

silvaphl commented 3 years ago

Hi, @jeromerony,

Thank you very much for your response.

The output is:

WARNING - dataset - Added new config entry: "color_jitter"
WARNING - dataset - Added new config entry: "data_path"
WARNING - dataset - Added new config entry: "name"
WARNING - dataset - Added new config entry: "resize"
INFO - Metric Learning - Running command 'print_config'
INFO - Metric Learning - Started
Configuration (modified, added, typechanged, doc):
  cpu = False                        # Force training on CPU
  cudnn_flag = 'benchmark'
  epochs = 100
  label_smoothing = 0.1
  lr = 0.05
  momentum = 0.0
  nesterov = False
  no_bias_decay = True
  scheduler = 'warmcos'
  seed = 483438777                   # the random seed for this experiment
  temp_dir = '/tmp'
  visdom_freq = 20
  visdom_port = None
  weight_decay = 0.0005
  dataset:
    batch_size = 128
    color_jitter = [0.3, 0.3, 0.3, 0.1]
    crop_size = 224
    data_path = 'data/CARS_196'
    name = 'cars'
    num_workers = 8                  # number of workers used ot load the data
    pin_memory = False               # use the pin_memory option of DataLoader
    preload = False                  # load all images into RAM to avoid IO
    ratio = [1, 1]
    recalls = [1, 2, 4, 8, 16, 32]
    resize = [256, 256]
    sampler = 'random'
    scale = [0.16, 1]
    test_batch_size = 256
    test_file = 'test.txt'
    train_file = 'train.txt'
  model:
    arch = 'resnet50'
    detach = False                   # detach features before feeding to the classification layer. Prevents training of the feature extractor with cross-entropy.
    dropout = 0.5
    norm_layer = 'batch'             # use a normalization layer (batchnorm or layernorm) for the features
    normalize = False                # normalize the features
    normalize_weight = False         # normalize the weights of the classification layer
    num_features = 2048              # dimensionality of the features produced by the feature extractor
    pretrained = True                # use a pretrained model from torchvision
    remap = False                    # remap features through a linear layer
    set_bn_eval = True               # set bn in eval mode even in training
INFO - Metric Learning - Completed after 0:00:00

The content of the run.json is:

{
  "artifacts": [
    "resnet50_cars.pt"
  ],
  "command": "main",
  "experiment": {
    "base_dir": "/media/work/pedro/2021-Dloss/dml_cross_entropy",
    "dependencies": [
      "munch==2.5.0",
      "numpy==1.20.0",
      "sacred==0.8.2",
      "torch==1.8.0.dev20210207+cu110",
      "torchvision==0.9.0.dev20210207+cu110",
      "visdom-logger==0.1"
    ],
    "mainfile": "experiment.py",
    "name": "Metric Learning",
    "repositories": [
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      },
      {
        "commit": "f0dbc12715e205c1fe5e0105126b51d27ed7525e",
        "dirty": true,
        "url": "https://github.com/jeromerony/dml_cross_entropy.git"
      }
    ],
    "sources": [
      [
        "experiment.py",
        "_sources/experiment_0e74b4ce8bb306e7655b4315eacfafac.py"
      ],
      [
        "models/__init__.py",
        "_sources/__init___d41d8cd98f00b204e9800998ecf8427e.py"
      ],
      [
        "models/ingredient.py",
        "_sources/ingredient_a93f5bdde83e4261d7afbffe8a35a859.py"
      ],
      [
        "utils/__init__.py",
        "_sources/__init___57a09b82bba70fdeb7c7d94eb70f5aaf.py"
      ],
      [
        "utils/data/__init__.py",
        "_sources/__init___d41d8cd98f00b204e9800998ecf8427e.py"
      ],
      [
        "utils/data/dataset_ingredient.py",
        "_sources/dataset_ingredient_194c17586f80d41c785390b0ef44e113.py"
      ],
      [
        "utils/data/image_dataset.py",
        "_sources/image_dataset_66740efc210576efb1cb8f8984a9ffa8.py"
      ],
      [
        "utils/data/utils.py",
        "_sources/utils_ff919360bd739bb1d8c1b440bc2ddb84.py"
      ],
      [
        "utils/training.py",
        "_sources/training_2bd15bf9836f6059a4709bf2899dae59.py"
      ],
      [
        "utils/utils.py",
        "_sources/utils_1d076e19a25fb3cee29bd45f5cd58410.py"
      ]
    ]
  },
  "heartbeat": "2021-02-07T21:42:20.925972",
  "host": {
    "ENV": {},
    "cpu": "Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz",
    "gpus": {
      "driver_version": "455.32.00",
      "gpus": [
        {
          "model": "GeForce RTX 3090",
          "persistence_mode": false,
          "total_memory": 24268
        }
      ]
    },
    "hostname": "cisco",
    "os": [
      "Linux",
      "Linux-5.4.0-26-generic-x86_64-with-glibc2.29"
    ],
    "python_version": "3.8.5"
  },
  "meta": {
    "command": "main",
    "options": {
      "--beat-interval": null,
      "--capture": null,
      "--comment": null,
      "--debug": false,
      "--enforce_clean": false,
      "--file_storage": "results/cars/",
      "--force": false,
      "--help": false,
      "--loglevel": null,
      "--mongo_db": null,
      "--name": null,
      "--pdb": false,
      "--print-config": false,
      "--priority": null,
      "--queue": false,
      "--s3": null,
      "--sql": null,
      "--tiny_db": null,
      "--unobserved": false,
      "COMMAND": null,
      "UPDATE": [
        "dataset.cars",
        "model.resnet50",
        "epochs=100",
        "lr=0.05",
        "model.norm_layer=batch"
      ],
      "help": false,
      "with": true
    }
  },
  "resources": [],
  "result": {
    "dtype": "float64",
    "py/object": "numpy.float64",
    "value": 1.03
  },
  "start_time": "2021-02-07T20:47:57.078268",
  "status": "COMPLETED",
  "stop_time": "2021-02-07T21:42:20.901349"
}

jeromerony commented 3 years ago

Can you try in a new environment with torch 1.7 and cudatoolkit=10.2 ?

silvaphl commented 3 years ago

Hi, @jeromerony,

I'll try with an environment like you suggested.

I'll provide the results I reached.

Thanks.

jeromerony commented 3 years ago

You don't need to train at all to check if the issue is resolved. The recall @ 1 before training should be higher than 40.

silvaphl commented 3 years ago

Ok, @jeromerony .

Thanks

silvaphl commented 3 years ago

@jeromerony

With the torch 1.7 I could get the recall@1 training higher than 40.

{'cosine': {1: 45.28, 2: 57.21, 4: 69.0, 8: 79.12, 16: 87.75, 32: 94.07},       
 'l2': {1: 43.39, 2: 54.99, 4: 67.22, 8: 77.53, 16: 86.74, 32: 93.48}}

The problem should be with the newer version of the Torch.

Thanks again.

jeromerony commented 3 years ago

Glad the issue is resolved. I suspect the problem is with faiss-gpu which is not compatible with cudatoolkit 11 yet. There are several issues related to that here https://github.com/facebookresearch/faiss/issues.

jeromerony / dml_cross_entropy

Understanding the results #5