robertu94 / libpressio-predict

High Fidelity Proxy Models for Compression
1 stars 2 forks source link

Transforming the Datasets #1

Open robertu94 opened 3 years ago

robertu94 commented 3 years ago

The ideal solution is compose-able, and can perform the following transformations (names are illustrative, not final):

struct pressio_datapreparer : public pressio_errorable, public pressio_configurable {
  virtual std::vector<pressio_data> prepare_data(std::vector<pressio_data> const& samples)=0;
}

struct sample_blockswithin_dataset: public pressio_datapreparer {

   /**
    - first, divide the each dataset into n-d chunks of some user defined size {x_1, x_2, ... x_n}
      then, randomly select some user defined k of them for each pressio_data
      return them in the order { data_1_block_1, data_1_block_2, data_2_block_1, data_2_block_2 ...}

      i.e. there should be k * samples.size() in the vector that is returned.
   */
   std::vector<pressio_data> prepare_data(std::vector<pressio_data> const& samples) override;

   //.... other required methods
};

struct sample_data: public pressio_datapreparer {

   /**
      of the passed `pressio_data`, randomly select a user defined k of them
   */
   std::vector<pressio_data> prepare_data(std::vector<pressio_data> const& samples) override;

  //.... other required methods
};

struct composite_datapreparer: public pressio_datapreparer {

  /**
    we can use this class to combine the previous 2 definitions
  */
  std::vector<pressio_data> prepare_data(std::vector<pressio_data> const& samples) override {
    //pseudo code, types might not check
    return std::accumulate(
      transformers.begin(),
      transformers.end(),
      samples,
      [](std::vector<pressio_data> const& current, std::unique_ptr<pressio_datapreparer>& trans) {
        return trans->prepare_data(current);
      });
  }

  std::vector<std::unique_ptr<pressio_datapreparer>> transformers;

 // .... other required methods
};