faster eval - Githubissues

ahangchen commented 7 years ago

I find the evaluate code running very slow when predict accuracy is low, and manage to accelerate the evaluation.

The key point is, when we get the match and junk array, we can also get the indexes of these person id in result_argsort. With these indexes, we can compute the rank_1 and'mAP' value in a much smaller double cycle.

def map_rank_quick_eval(query_info, test_info, result_argsort):
    # about 10% lower than matlab result
    # for evaluate rank1 and map
    match = []
    junk = []

    for q_index, (qp, qc) in enumerate(query_info):
        tmp_match = []
        tmp_junk = []
        for t_index in range(len(test_info)):
            p_t_idx = result_argsort[q_index][t_index]
            p_info = test_info[int(p_t_idx)]

            tp = p_info[0]
            tc = p_info[1]
            if tp == qp and qc != tc:
                tmp_match.append(t_index)
            elif tp == qp or tp == -1:
                tmp_junk.append(t_index)
        match.append(tmp_match)
        junk.append(tmp_junk)

    rank_1 = 0.0
    mAP = 0.0
    for idx in range(len(query_info)):
        if idx % 100 == 0:
            print('evaluate img %d' % idx)
        recall = 0.0
        precision = 1.0
        ap = 0.0
        YES = match[idx]
        IGNORE = junk[idx]
        ig_cnt = 0
        for ig in IGNORE:
            if ig < YES[0]:
                ig_cnt += 1
            else:
                break
        if ig_cnt >= YES[0]:
            rank_1 += 1

        for i, k in enumerate(YES):
            ig_cnt = 0
            for ig in IGNORE:
                if ig < k:
                    ig_cnt += 1
                else:
                    break
            cnt = k + 1 - ig_cnt
            hit = i + 1
            tmp_recall = hit / len(YES)
            tmp_precision = hit / cnt
            ap = ap + (tmp_recall - recall) * ((precision + tmp_precision) / 2)
            recall = tmp_recall
            precision = tmp_precision

        mAP += ap
    rank1_acc = rank_1 / QUERY_NUM
    mAP = mAP / QUERY_NUM
    print('Rank 1:\t%f' % rank1_acc)
    print('mAP:\t%f' % mAP)
    return rank1_acc, mAP

Although this code can compute rank_1_acc and mAP very fast (in 3 minute) and rank_1_acc is the same with the origin code, but mAP is a little larger than the origin result.

Is there anything wrong with this code?

hehefan commented 7 years ago

Hi, @ahangchen, I'm sorry I'm very busy recently. I think the problem is the following code: for ig in IGNORE: if ig < k: ig_cnt += 1 else: break The original code is len(query_info)TEST_NUM at most. However, your code is en(query_info)len(YES)*len(IGNORE) at most.

ahangchen commented 7 years ago

@hehefan The original code time consumption is 2len(query_info)TEST_NUM, and this code time consumption is len(query_info)TEST_NUM + len(query)len(YES)*len(YES), the third factor is not len(IGNORE) but len(YES) at most because of the break sentence. len(YES) is very small, thus this code is quite faster than the origin code.

ahangchen commented 7 years ago

And my question is about the mAP value but not the time consumption. I can't figure out this from your reply (you're talking about the time consumption, right?)

ericxian1997 commented 6 years ago

I think the evaluate code is slow too. Especially testing models for Market, which needs about 90 minutes. But testing Duke models only needs several mins. Can your code do this job much faster than the original code? @ahangchen

ahangchen commented 6 years ago

@ericxian1997 Yes, much faster, try it.

hehefan / Unsupervised-Person-Re-identification-Clustering-and-Fine-tuning

faster eval #4