Difference between paper and code in calculating distance function

I'm currently working on text classification tasks with LSTM and would like to use LIME to help to explain the results.

I have two questions.

(1) I think there's difference between paper and code in calculating distance function D(x,z) (Eq.2 in the paper). Specifically, in the paper, the distance function is calculated over original representations x and z in R^d, on the other hand, in the code (lime/lime/lime_text.py), it is calculated over interpretable (or binary) representations x' and z' in {0,1}^d' as follows:

class LimeTextExplainer(object):
    ...
    def __data_labels_distances(self,
                                indexed_string,
                                classifier_fn,
                                num_samples,
                                distance_metric='cosine'):
        ...
        data = np.ones((num_samples, doc_size))
        ...
        distances = distance_fn(sp.sparse.csr_matrix(data))
        ...

(2) If the calculation method in the paper is correct, i couldn't figure out how i could calculate the distance function D(x,z) as an input of a sequence of token vectors {x1, ..., xn}.

I would really appreciate it if you could respond to the questions.

marcotcr / lime

Difference between paper and code in calculating distance function #403