An error in data/dataset/eval_reid.py?

iGuaZi commented 4 years ago

Hi, I believe there is an error. Lines with NOTE1 and NOTE2 may cause error in NOTE3., especially when num_g < max_rank.

def eval_func(distmat, q_pids, g_pids, q_camids, g_camids, max_rank=50):
    """Evaluation with market1501 metric
        Key: for each query identity, its gallery images from the same camera view are discarded.
        """
    num_q, num_g = distmat.shape
    if num_g < max_rank:
        max_rank = num_g
        print("Note: number of gallery samples is quite small, got {}".format(num_g))
    indices = np.argsort(distmat, axis=1)
    matches = (g_pids[indices] == q_pids[:, np.newaxis]).astype(np.int32)

    # compute cmc curve for each query
    all_cmc = []
    all_AP = []
    num_valid_q = 0.  # number of valid query
    for q_idx in range(num_q):
        # get query pid and camid
        q_pid = q_pids[q_idx]
        q_camid = q_camids[q_idx]

        # remove gallery samples that have the same pid and camid with query
        order = indices[q_idx]
        remove = (g_pids[order] == q_pid) & (g_camids[order] == q_camid)
        keep = np.invert(remove)

        # compute cmc curve
        # binary vector, positions with value 1 are correct matches
        orig_cmc = matches[q_idx][keep]  # NOTE1: variable length

        if not np.any(orig_cmc):
            # this condition is true when query identity does not appear in gallery
            continue

        cmc = orig_cmc.cumsum()
        cmc[cmc > 1] = 1

        all_cmc.append(cmc[:max_rank])  # NOTE2: may be variable length
        num_valid_q += 1.

        # compute average precision
        # reference: https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision
        num_rel = orig_cmc.sum()
        tmp_cmc = orig_cmc.cumsum()
        tmp_cmc = [x / (i + 1.) for i, x in enumerate(tmp_cmc)]
        tmp_cmc = np.asarray(tmp_cmc) * orig_cmc
        AP = tmp_cmc.sum() / num_rel
        all_AP.append(AP)

    assert num_valid_q > 0, "Error: all query identities do not appear in gallery"

    all_cmc = np.asarray(all_cmc).astype(np.float32)  # NOTE3: may cause error!
    all_cmc = all_cmc.sum(0) / num_valid_q
    mAP = np.mean(all_AP)

    return all_cmc, mAP

michuanhaohao commented 4 years ago

Thanks! Where are Note1, Note2 and Note3?

iGuaZi commented 4 years ago

in comments, e.g. at the line: all_cmc.append(cmc[:max_rank])

michuanhaohao commented 4 years ago

I consider you are right. However, num_g should less than max_rank in logically. If you encounter a bug, you can set max_rank < num_q, or modify the code with 'try & except'.

iGuaZi commented 4 years ago

the question is that I don't think the eval code is correct, because it could change the rank. oric_cmc[0] may not be the top 1 rank

orig_cmc = matches[q_idx][keep]  # NOTE1: variable length

I think the following is correct

matches[q_idx][remove]=0
oric_cmc = matches[q_idx]

michuanhaohao / reid-strong-baseline

An error in data/dataset/eval_reid.py? #92