mala-project / mala

Materials Learning Algorithms. A framework for machine learning materials properties from first-principles data.
https://mala-project.github.io/mala/
BSD 3-Clause "New" or "Revised" License
81 stars 26 forks source link

Add convenience functions for tester class #378

Closed RandomDefaultUser closed 1 year ago

RandomDefaultUser commented 1 year ago

The tester class is mostly used to verify a model before using it in production. It currently gives back full volumetric data. It would be nice to let it return simply energies with a different convenience function, since that is what one is mostly interested in.

sarkarghya commented 1 year ago

Hi Lenz just wanted to let you know that I have been looking at this issue but haven't made a lot of progress. I wanted one clarification: What do you mean by "convenience function"? is this a scientific term for some constant or is it an actual program function.

RandomDefaultUser commented 1 year ago

Hi Arghya, Sorry if the description was may be a bit imprecise. What I mean by "convenience function" is wrapping functionalities that users often find themselves coding up manually at the moment into the official API, to reduce the amount of code users have to write. One such example is the Tester class, which is used to judge the accuracy of a model. The idea of this class is to use a trained network, give some testing snapshots, and then get both the actual and prediced LDOS. The LDOS itself may not hold that much relevant information for determining whether or not a network is accurate. What is needed instead are usually energy errors. Therefore, users find themselves constantly writing scripts like the following:

# Loading a trained network. 
parameters = mala.Parameters.load_from_file(run+".params.json")
network = mala.Network.load_from_file(parameters, run+".network.pth")
iscaler = mala.DataScaler.load_from_file(run+".iscaler.pkl")
oscaler = mala.DataScaler.load_from_file(run+".oscaler.pkl")
parameters.data.use_lazy_loading = True
parameters.targets.pseudopotential_path = "/some/path/"

# Build a data handler and fill it with data.
inference_data_handler = mala.DataHandler(parameters, input_data_scaler=iscaler, output_data_scaler=oscaler)
inference_data_handler.clear_data()
inference_data_handler.add_snapshot(...)
inference_data_handler.prepare_data()

# Creating a Tester object. 
tester = mala.Tester(parameters, network, inference_data_handler)
results_array = []

# Now, for all the snapshots, perform a network pass,
# then calculate some quantities and the errors in those. 
# THIS is the part that I would like to wrap in a function. 
for i in range(0, inference_data_handler.nr_snapshots):
    actual_ldos, predicted_ldos = tester.test_snapshot(i)

    # Use the LDOS object to do postprocessing.
    ldos_calculator: mala.LDOS
    ldos_calculator = inference_data_handler.target_calculator
    ldos_calculator.read_additional_calculation_data("qe.out", inference_data_handler.get_snapshot_calculation_output(i))

    # Calculating energy
    ldos_calculator.read_from_array(actual_ldos)
    band_energy_actual = ldos_calculator.band_energy
    total_energy_actual = ldos_calculator.total_energy

    ldos_calculator.read_from_array(predicted_ldos)
    band_energy_predicted = ldos_calculator.band_energy
    total_energy_predicted = ldos_calculator.total_energy

    printout("Snapshot #"+str(i))
    results_array.append([band_energy_actual, total_energy_actual,band_energy_predicted, total_energy_predicted, ldos_calculator.band_energy_dft_calculation, ldos_calculator.total_energy_dft_calculation])
    out_array = np.array(results_array)
np.savetxt("results_"+run+".csv", out_array, delimiter=",")

No I would argue that most of this for loop, mabye even the for loop itself, could be handled internally by MALA. This would add a lot of convenience when performing ML experiments. Example 05 shows how a network can be trained and then tested, albeit the testing part is using the old API (the script I shared here has the new one). Does this help?