`reflectorch.data_generation.reflectivity.abeles.abeles` doesn't allow you to change the SLD of the ambient medium

andyfaff commented 1 month ago

I've been experimenting using the reflectorch implementation of the abeles calculation, using reflectorch.data_generation.reflectivity.abeles.abeles.

After trial and error (#12 does not outline what the shapes of the input tensors should be) I eventually figured out how to use the abeles function by using the examples given in the unit tests.

In general for an N layer system I would expect there to be (give or take multidimensionality for batching):

an array of shape (N,) specifying the thicknesses
an array of shape (N+1,) specifying the roughnesses
an array of shape (N+2,) specifying the SLDs.

When I look at the examples the array that's expected for sld is (N+1,). This must mean that there is no way of specifying the ambient SLD.

Whilst fine for most air-solid measurements (where SLD_{ambient} == 0), it means the code is not applicable to most solid-liquid and liquid-liquid systems, where the SLD of the ambient/fronting medium is not zero.

This is straightforward to do, see the refnx implementation for a guide. All it involves is subtracting the ambient SLDs from all the other SLDs (i.e. SLD[1:], if SLD_{ambient}=SLD[0]).

StarostinV commented 1 month ago

Thank you for your comment, Andrew! Indeed, it is very straightforward. I have updated the abeles function in dev branch 442229e, and Valentin will soon push it to main with respective changes in the docstrings etc.

andyfaff commented 1 month ago

I was experimenting with the torch implementation. I found that quite large batch values are needed before the GPU implementation becomes faster than the CPU calc in refnx. What kind of batch sizes do you use during training?

StarostinV commented 1 month ago

I typically use batch sizes ranging from 4096 to 16384. For tasks like importance sampling or MCMC with PyTorch, it can be increased even further. In general, the size is a power of 2:

$$N = 2^n, n \in [10, 16]$$

Of course, the degree of acceleration depends on GPU. In my usual settings with two layers on top of a substrate and 128 q points, the GPU-accelerated code produces around 1 million curves per second for NVIDIA RTX 2080 Ti.

andyfaff commented 1 month ago

That's pretty cool. I think the fastest batch setting refnx would be able to offer at the moment is a third of that speed (i.e. 3us per curve calculation). That would be on CPU in double precision.

StarostinV commented 1 month ago

That's pretty good, I don't remember that I could achieve this speed in refnx. Is it the standard API? For instance, the acceleration I get with MCMC is orders of magnitude compared to refnx, and the main cost is reflectometry calculation there.

Of course, apart from the simulation time, there are additional benefits of using PyTorch implementation, such as autograd (e.g. fast score function calculation) and integration with the ML pipeline (sending data to GPU takes quite some time).

schreiber-lab / reflectorch

`reflectorch.data_generation.reflectivity.abeles.abeles` doesn't allow you to change the SLD of the ambient medium #13