mcmahon-lab / Physics-Aware-Training

Instructional implementation of Physics-Aware Training (PAT) with demonstrations on simulated experiments.
Creative Commons Attribution 4.0 International
281 stars 47 forks source link

Questions #1

Closed emilyxia closed 2 years ago

emilyxia commented 3 years ago

First of all, I want to thank you for the excellent documentation of the work. It's very impressive. Regarding to the repo and the paper, I have several questions listed below.

  1. Regarding the machine learning task presented in the paper, are there any specific tasks that would be most appropriate for the specific physical systems, for example, for the SHG system? Did the same system show any limitations when trained for other ML tasks?

  2. In the example code of the multilayer net for coupled pendula, I noticed that in the forward model the PNN is constructed by propagating the input through pendula 1, pendula 2, and pendula 3 using the same underlying neural networks. I was wondering why a three-times loop of one pendula is not used in In [30]? What if I would like to simulate a relatively deeper network, can I use a loop?

  3. For the simulation of Coupled Pendulas and other tasks, I noticed that the user needs to define a way to extract the output in order to have a meaningful interpretation of the ML result. For instance, in Coupled Pendula, the output (final layer) is selected as to be the angle of the middle pendula. May I ask if this is a tricky part of training a PNN? Does the definition of the last layer matter a lot in terms of training? Do you have any suggestions on how I should define the last layer given a certain physical system?

Thanks for your time and patience!

ms3452 commented 3 years ago

Hi, thank you for the interest and the questions:

  1. As far as we know, a particular physical system might be good at some machine learning tasks and bad at others. When we trained the SHG system to perform MNIST handwritten digit classification, we struggled with the inability of the SHG process to implement a linear operation, while a linear layer alone can achieve around 90% on MNIST. This was a clear limitation of the SHG PNN: It has to do MNIST classification without that 90% "baseline", which is why it was well suited to a demonstration of digital layers and SHG working in conjunction. Almost trivially, the closer the mathematical process that generated the ML data to the physical system, the more suited the system will be to solve the task, taken to the extreme in the self-simulation analysis we did in the appendix. As to whether there are relevant machine learning tasks that are particularly appropriate to the SHG system, we are unsure.
  2. You could use a for loop to call identical pendula 1, pendula 2, and pendula 3, but we found that by introducing heterogeneity in the parameters of the pendula, we could get slightly better performance. Pendula 1, 2, and 3 have slightly different frequencies and coupling constants. This is a good point though--our syntax is very clumsy for an operation that, conceptually, is just repeated 3 times. I will think about how to make the code easier legible. Please let me know if you have ideas. I had in mind the notation usually used in pytorch, where different layers of a NN have different names (e.g. fc1, fc2,...):
    def forward(self, x):
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
  3. When designing the output layer of a physical system, we were cognizant that the output layer of one layer could be the input to another layer, without too much postprocessing. I.e. the output angle can relatively easily translate to an input angle. We put the output pendulum into the middle so that it would equally couple to both inputs on the left and right. In principle, the final output measurement can give you a powerful nonlinearity, For example in http://doi.org/10.1126/science.aat8084, Lin & Rivenson et al use an intensity output distribution, I~E^2, while the physics is linear in the electric field. For most of the PNN demonstrated in the paper, we tried to not use nonlinear operations to show that the physical evolution is performing the required computation. Thus, we relied mostly on binning the signal we measure (a linear operation) and applying the one-hot encoding, which is the standard operation for conventional NNs.

I hope these answers are helpful and we are happy to keep discussing!