evaluation for custom data

tiger-bug commented 4 years ago

Good morning,

I'm trying to run some of my own data and I am at the evaluation section. I'm calculating a confusion matrix and the numbers seem to be a bit strange so I just want to make sure I'm doing it correctly or figure out what I'm doing wrong. Here is a snippet of my code.

pred_list = [pred for pred in os.listdir(data_folder) \
         if pred.split(".")[0].split("_")[-1] == 'pred']

#Data folder is where my predicted and tested h5 files are located
acc, tot = 0,0
result = np.zeros((NUM_CLASSES,NUM_CLASSES),dtype=int) #result is the confusion matrix
max_ind = 0 #Calculate max indices

for pred in pred_list:

    data = h5py.File(os.path.join(data_folder, pred))

    f = '_'.join(pred.split('_')[:-1])+'.h5' #Open corresponding test h5 file 
    data_test = h5py.File(os.path.join(data_folder, f))

    # Open predicted h5 file

    labels_seg = data['label_seg'][...].astype(np.int64)
    indices = data['indices_split_to_full'][...].astype(np.int64)
    if indices.max() > max_ind: max_ind = indices.max() 
    confidence = data['confidence'][...].astype(np.float32)
    data_num = data['data_num'][...].astype(np.int64)

    # Open test h5 file

    t_labels_seg = data_test['label_seg'][...].astype(np.int64)
    t_indices = data_test['indices_split_to_full'][...].astype(np.int64)
    t_data_num = data_test['data_num'][...].astype(np.int64)

 # Loop through corresponding h5 file and calculate confusion matrix

    for i in range(labels_seg.shape[0]):
        test = t_labels_seg[i][:t_data_num[i]]
        predicted = labels_seg[i][:data_num[i]]
        test_ind = t_indices[i][:t_data_num[i]]
        ind = indices[i][:data_num[i]]
        if False in np.equal(test_ind,ind): print('Indices don\'t match!') # Just a sanity check to ensure the indices match

        tot += test.shape[0]
        dif = test==predicted

        acc+=dif.sum()

### Calculate confusion matrix
        for i in range(len(predicted)):
            result[test[i]][predicted[i]] += 1

    data.close()
    data_test.close()

Now, when I look at tot, max_ind, and np.sum(result), I would have thought that tot==max_ind==np.sum(result), however tot>max_ind. Is this to be expected? Because the max_ind is the maximum index point in the test set, so I'm not sure how the total points tested can be greater than that unless there are repeats.

NicolaiLolansen commented 4 years ago

Not sure if this helps you at all, and i would like to know if you figure this out. But here is a snippet of the article:

"Each testing point cloud is sampled multiple times to make sure all the points are evaluated at least r (r = 10 in our experiments) times at testing time"

Can this maybe explain why there are repeats?

ninasinger commented 4 years ago

I am having a similar issue, have you made any progress on this?

tiger-bug commented 4 years ago

Good morning everyone!

Thank you for the comments. Nicolai Mogensen I did not recall that part of the article but it does make sense why there are repeats. It seems I will have to use indices_split_to_full instead of data_num. I will work on this more. I went back and worked on some training code so I stepped away from this for a while. I will post more if I figure it out. Thanks!

ninasinger commented 4 years ago

Please let me know if you are able to recall the indices from the original data with "indices_split_to_full". I am finding that the normalized blocks stored in "data" are correct but when I try to map them back to the original data in the merge using "indices_split_to_full" , the blocks are not making sense.

tiger-bug commented 4 years ago

Sorry for the late response. Here's something I've come up with that may work (May need to check my work though)

# Import modules
import h5py
import numpy as np

# Load in h5 file (just as an example)
pred_file = '/path/to/h5-files/file_pred.h5'

# Load in h5 files
data = h5py.File(pred_file)

img = data['data'][...]
data_num = data['data_num'][...]
indices = data['indices_split_to_full'][...]
label_seg = data['label_seg'][...]
confidence = data['confidence'][...]

max_ind = np.max(indices) # Get max index
label_flat = -1 * np.ones(max_ind, dtype=np.int32) # I make it -1 since a label of '0' is an actual label

label_flat[indices.flatten()-1]= label_seg.flatten() # indicies.flatten() - 1 because the index is out of range with indices.flatten()

Not sure how to loop through the rest of them but I think this right?... Not sure. I'm also not sure how to add in confidence either

yangyanli / PointCNN

evaluation for custom data #205