rhayes777 / PyAutoFit

PyAutoFit: Classy Probabilistic Programming
https://pyautofit.readthedocs.io/
MIT License
59 stars 11 forks source link

Simplify Samples calculation via search #1002

Open Jammy2211 opened 4 months ago

Jammy2211 commented 4 months ago

search.py has a lot of annoying similar methods to do with samples:

    @property
    def samples_cls(self):
        raise NotImplementedError()

    def samples_from(self, model: AbstractPriorModel, search_internal=None) -> Samples:
        """
        Loads the samples of a non-linear search from its output files.

        The samples can be loaded from one of two files, which are attempted to be loading in the following order:

        1) Load via the internal results of the non-linear search, which are specified to that search's outputs
           (e.g. the .hdf file output by the MCMC method `emcee`).

        2) Load via the `samples.csv` and `samples_info.json` files of the search, which are outputs that are the
           same for all non-linear searches as they are homogenized by autofit.

        Parameters
        ----------
        model
            The model which generates instances for different points in parameter space.
        """
        try:
            return self.samples_via_internal_from(
                model=model, search_internal=search_internal
            )
        except (FileNotFoundError, NotImplementedError, AttributeError):
            return self.samples_via_csv_from(model=model)

    def samples_via_internal_from(
        self, model: AbstractPriorModel, search_internal=None
    ):
        raise NotImplementedError

    def samples_via_csv_from(self, model: AbstractPriorModel) -> Samples:
        """
        Returns a `Samples` object from the `samples.csv` and `samples_info.json` files.

        The samples contain all information on the parameter space sampling (e.g. the parameters,
        log likelihoods, etc.).

        The samples in csv format are already converted to the autofit format, where samples are lists of values
        (e.g. `parameter_lists`, `log_likelihood_list`).

        Parameters
        ----------
        model
            Maps input vectors of unit parameter values to physical values and model instances via priors.
        """

        return self.samples_cls.from_csv(
            paths=self.paths,
            model=model,
        )

samples_cls is particularly annoying, and associated with searches based on their type.

We can simplify this as follows:

1) Remove samples_via_csv_from and move its functionality to samples_from.

2) Make a factory in Samples which has all functionality in samples_from. This will inspect the type of the input search_internal and return the Samples object based on its type (e.g. if search_internal is a NestSearch then return SamplesNest).

3) The function samples_via_internal_from must still stay in search, we commonly overwrite it to map the internal search parameter / likelihood values to autofit values and layout.