idiap / importance-sampling

Code for experiments regarding importance sampling for training neural networks
Other
321 stars 60 forks source link

Creating a New Sampler #8

Closed kris-singh closed 6 years ago

kris-singh commented 7 years ago

Hi, First, Great work by you guys I am also trying to do something similar in my research. I am trying to implement the Entropy-Based Sampling. I came up with following

class EntropySampler(ModelSampler):
    """ENtropySampler uses the entropy of the samples to do importance sampling

    Arguments
    ---------
    dataset: The dataset to sample from
    reweighting: The reweighting scheme
    model: The model to be used for scoring
    recompute: Compute the loss for the whole dataset every recompute batches
    """
    def __init__(self, dataset, reweighting, model, forward_batch_size=128,
                 recompute=2):
        super(HistorySampler, self).__init__(
            dataset,
            reweighting,
            model,
            forward_batch_size=forward_batch_size
        )

        # The configuration of EntropySampler
        self.recompute = recompute

        # Mutable variables holding the state of the sampler
        self._batch = 0
        self._scores = np.ones((len(dataset.train_data),))
        self._unseen = np.ones(len(dataset.train_data), dtype=np.bool)
        self._seen = np.zeros_like(self._unseen)

    def _entropy(self, x, px):
        assert(len(x) == len(px))
        px = px/sum(px)
        entropy = []
        for i in range(1, len(px)):
            entropy.append(px*np.log(px))
        return entropy

    def _get_samples_with_scores(self, batch_size):
        return (
            np.arange(len(self._scores)),
            self._scores,
            None
        )

    def update(self, idxs, results):
        # Update the scores of the seen samples
        self._scores[idxs] = results.ravel()
        self._unseen[idxs] = False
        self._seen[idxs] = True
        self._scores[self._unseen] = self._scores[self._seen].mean()

        # Recompute all the scores if needed
        self._batch += 1
        if self._batch % self.recompute == 0:
            for i in range(0, len(self.dataset.train_data), 1024*64):
                x, y = self.dataset.train_data[i:i+1024*64]
                scores = self.model.score(
                    x, y,
                    batch_size=self.forward_batch_size
                ).ravel()
                self._scores[i:i+1024*64] = self._entropy(x, scores).ravel()
            self._seen[:] = True
            self._unseen[:] = False

But I am not sure how to create BaseImportanceTraining for this sampler. In particular i do not understand the partial function. Could you help me with this.

angeloskath commented 7 years ago

Ok, so judging by your code above you want to sample according to the entropy instead of the loss. I don't think you need to implement another sampler because what you want to change is the score.

I would suggest you look into the score_layers.py and model_wrappers.py specifically _get_scoring_layer(...). What you need is an entry similar to

...
elif score == "entropy":
    return LossLayer(entropy_function)([y_true, y_pred])

After that you can create your own class inheriting from _BaseImportanceTraining and have self.model = OracleWrapper(model, self.reweighting, score="entropy") in the constructor.

I know I did not answer the partial question. So I will try to answer it now. The sampler requires a dataset which we do not have in the constructor. Instead we create a function with every argument given except the dataset (the arguments are partially given). If you prefer you can implement it differently by keeping arguments as member variables and doing the work in the sample() member function.

If you want any more help send me an email (it can be found in the code), I like discussing and uncovering bad/confusing designs (like the partial function). Also please do not hesitate to do a pull request if you have implemented something and want to share it.

I will keep the issue open for a while in case you want to add something here (if you want you can close it).

kris-singh commented 6 years ago

Hi thanks for the response.

I was going through your paper. I couldn't understand some proof's correctly could you help me out. I was not able to understand Eq(5) basically Var(X) = E(X^2)-(E(X))^2. I did not understand how you could you get that to be equal to the (E(X))^2 + tr(Var(X))= E(X^2). I looked at the Wang et al paper they do not have the trace term. Second I also did not get Eq(25). Could you explain how you got it.

P.S: Sorry If these are trivial doubts i could figure it out.

angeloskath commented 6 years ago

Sure, maybe the definition of Var() is not very clear so I will clarify everything below.

x \in R^d ~ D // all the expectations are taken w.r.t. D
Var(x) = E[x x^T] - E[x] E[x]^T
E[x^T x] = E[Tr(x x^T)]  // because of linearity of Tr
         = Tr(E[x x^T]) = Tr(Var(x) + E[x] E[x]^T)
         = Tr(Var(x)) + Tr(E[x] E[x]^T)
         = Tr(Var(x)) + E[x]^T E[x]

I will close the issue but feel free to re-open it if you have more problems.