lab-cosmo / librascal

A scalable and versatile library to generate representations for atomic-scale learning
https://lab-cosmo.github.io/librascal/
GNU Lesser General Public License v2.1
80 stars 20 forks source link

Why does librascal uses so much memory? #324

Open Luthaf opened 3 years ago

Luthaf commented 3 years ago

For the sample code below (I can share the full thing with structures if required, there are currently 10 structures with around 40 atoms each), I get this output:

Sample Python code ```py def format_mem(nbytes): mem_mb = nbytes / 1024 / 1024 if mem_mb > 1024: return f"{mem_mb / 1024:.4} GiB" else: return f"{mem_mb:.4} MiB" print("memory before: ", format_mem(psutil.Process(os.getpid()).memory_info().rss)) soap = SphericalInvariants( soap_type="PowerSpectrum", interaction_cutoff=3.5, max_radial=6, max_angular=6, gaussian_sigma_constant=0.3, gaussian_sigma_type="Constant", cutoff_smooth_width=0.5, radial_basis="GTO", normalize=True, compute_gradients=True, ) kernel = Kernel( soap, name="GAP", zeta=1, target_type="Structure", kernel_type="Sparse" ) managers = soap.transform(frames) print("feature mem: ", format_mem(managers.get_features(soap).nbytes)) compressor = CURFilter(soap, pseudo_points, act_on="sample per species") X_pseudo = compressor.select_and_filter(managers) K_MM = kernel(X_pseudo) K_E = kernel(managers, X_pseudo, grad=(False, False)) K_E /= regularization.energies[:, np.newaxis] K_F = kernel(managers, X_pseudo, grad=(True, False)) K_F /= regularization.forces K_NM = np.vstack((K_E, K_F)) print("K_MM mem: ", format_mem(K_MM.nbytes)) print("K_NM mem: ", format_mem(K_NM.nbytes)) del K_E del K_F print("memory used: ", format_mem(psutil.Process(os.getpid()).memory_info().rss)) return K_MM, K_NM ```
memory before:  66.38 MiB

feature mem:  8.344 MiB
K_MM mem:  0.01221 MiB
K_NM mem:  0.4004 MiB

memory used:  3.499 GiB

The process uses around 3.5 GiB of RAM, while features only occupy ~8MiB. Even accounting for gradients (let's say 20 neighbor per atom x 3 spatial dimension, this gives around 500 MiB of additional memory), I don't understand how the code reaches 3.5 GiB.

This issue makes it harder for me to use librascal for a large number of structures, since I now have to go to compute facilities even for (what I perceive to be) small systems, since running such code locally quickly overwhelm my RAM and starts to aggressively swap, making everything very slow.


Am I missing something here? Is there a reason the code uses so much memory, or is this something we should try to improve?

felixmusil commented 3 years ago

How many atomic species are present in this dataset ?

Luthaf commented 3 years ago

Only 4: CHNO.

EDIT: here are the structures used in the test above in ASE XYZ: structures.xyz

Luthaf commented 3 years ago

So switching the code above from computing representation for all frames:

managers = soap.transform(frames)
K_E = kernel(managers, X_pseudo, grad=(False, False))
K_E /= regularization.energies[:, np.newaxis]

K_F = kernel(managers, X_pseudo, grad=(True, False))
K_F /= regularization.forces

K_NM = np.vstack([K_E, K_F])

To computing one frame at the time

K_E = []
K_F = []

for i, frame in enumerate(frames):
    managers = soap.transform([frame])
    k = kernel(managers, X_pseudo, grad=(False, False))
    K_E.append(k / regularization.energies[i, np.newaxis])

    k = kernel(managers, X_pseudo, grad=(True, False))
    K_F.append(k / regularization.forces)

K_NM = np.vstack((*K_E, *K_F))

Brings the memory usage down to 500MiB, and is faster to execute overall.

Luthaf commented 3 years ago

Here is a simpler standalone example (without kernels), reading from the file in https://github.com/cosmo-epfl/librascal/issues/324#issuecomment-802711381

import os
import psutil
from rascal.representations import SphericalInvariants

import ase
from ase import io

def format_mem(nbytes):
    mem_mb = nbytes / 1024 / 1024
    if mem_mb > 1024:
        return f"{mem_mb / 1024:.4} GiB"
    else:
        return f"{mem_mb:.4} MiB"

frames = ase.io.read("structures.xyz", ":")

print("memory before: ", format_mem(psutil.Process(os.getpid()).memory_info().rss))

soap = SphericalInvariants(
    soap_type="PowerSpectrum",
    interaction_cutoff=3.5,
    max_radial=6,
    max_angular=6,
    gaussian_sigma_constant=0.3,
    gaussian_sigma_type="Constant",
    cutoff_smooth_width=0.5,
    radial_basis="GTO",
    normalize=True,
    compute_gradients=True,
)

managers = soap.transform(frames)

print("memory used:", format_mem(psutil.Process(os.getpid()).memory_info().rss))

print("   including features:", format_mem(managers.get_features(soap).nbytes))

output:

memory before:  51.01 MiB
memory used: 3.452 GiB
   including features: 8.344 MiB

That's with 10 frames, containing 434 atoms in total.

ceriottm commented 3 years ago

Wow, that's impressive. Leak somewhere?

On Fri, 16 Apr 2021 at 17:13, Guillaume Fraux @.***> wrote:

Here is a simpler standalone example (without kernels), reading from the file in #324 (comment) https://github.com/cosmo-epfl/librascal/issues/324#issuecomment-802711381

import osimport psutilfrom rascal.representations import SphericalInvariants import asefrom ase import io

def format_mem(nbytes): mem_mb = nbytes / 1024 / 1024 if mem_mb > 1024: return f"{mem_mb / 1024:.4} GiB" else: return f"{mem_mb:.4} MiB"

frames = ase.io.read("structures.xyz", ":") print("memory before: ", format_mem(psutil.Process(os.getpid()).memory_info().rss)) soap = SphericalInvariants( soap_type="PowerSpectrum", interaction_cutoff=3.5, max_radial=6, max_angular=6, gaussian_sigma_constant=0.3, gaussian_sigma_type="Constant", cutoff_smooth_width=0.5, radial_basis="GTO", normalize=True, compute_gradients=True, ) managers = soap.transform(frames) print("memory used:", format_mem(psutil.Process(os.getpid()).memory_info().rss)) print(" including features:", format_mem(managers.get_features(soap).nbytes))

output:

memory before: 51.01 MiB memory used: 3.452 GiB including features: 8.344 MiB

That's with 10 frames, containing 434 atoms in total.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cosmo-epfl/librascal/issues/324#issuecomment-821246856, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIREZY6IXXVIN4NFICEXYDTJBHZTANCNFSM4ZOO3UCQ .

Luthaf commented 3 years ago

That, or we keep stuff around that is no longer needed (I'm thinking about the cell list, I've seen issues in other software where building the neighbor list blows up memory). I'll try running this example with massif to see if I can get a profile & more information about the origin of allocations.

Luthaf commented 3 years ago

Here is a massif profile: massif.out.4090.txt

And the (edited) output of ms_print --threshold=10 massif.out.4090

99.41% (3,658,807,544B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->96.84% (3,564,014,336B) 0x3415D972: Eigen::DenseStorage<double, -1, -1, 1, 0>::resize(long, long, long) [clone .isra.508] (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| ->86.18% (3,171,934,400B) 0x341B4D52: void rascal::CalculatorSphericalInvariants::initialize_per_center_powerspectrum_soap_vectors<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, rascal::BlockSparseProperty<double, 1ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > >, rascal::BlockSparseProperty<double, 2ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > >, rascal::BlockSparseProperty<double, 1ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > > >(rascal::BlockSparseProperty<double, 1ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > >&, rascal::BlockSparseProperty<double, 2ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > >&, rascal::BlockSparseProperty<double, 1ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > >&, std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| | ->86.18% (3,171,934,400B) 0x341B580F: void rascal::CalculatorSphericalInvariants::compute_impl<(rascal::internal::SphericalInvariantsType)1, 0, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >(std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |   ->86.18% (3,171,934,400B) 0x341B7690: void rascal::CalculatorSphericalInvariants::compute<rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict> >(rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict>&) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |     ->86.18% (3,171,934,400B) 0x3416CEBD: void pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, rascal::CalculatorSphericalInvariants, rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict>&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release> >(void (rascal::CalculatorSphericalInvariants::*)(rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(rascal::CalculatorSphericalInvariants*, rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict>&)
| |       ->86.18% (3,171,934,400B) 0x340B22C6: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |
| ->10.30% (379,246,208B) 0x341953ED: void rascal::BlockSparseProperty<double, 2ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > >::resize<std::vector, std::allocator<std::set<std::vector<int, std::allocator<int> >, std::less<std::vector<int, std::allocator<int> > >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::vector<int, std::allocator<int> >, std::less<std::vector<int, std::allocator<int> > >, std::allocator<std::vector<int, std::allocator<int> > > >(std::vector<std::set<std::vector<int, std::allocator<int> >, std::less<std::vector<int, std::allocator<int> > >, std::allocator<std::vector<int, std::allocator<int> > > >, std::allocator<std::set<std::vector<int, std::allocator<int> >, std::less<std::vector<int, std::allocator<int> > >, std::allocator<std::vector<int, std::allocator<int> > > > > > const&) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| | ->10.30% (379,246,208B) 0x341962A3: void rascal::CalculatorSphericalExpansion::initialize_expansion_environment_wise<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >(std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >&, rascal::BlockSparseProperty<double, 1ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > >&, rascal::BlockSparseProperty<double, 2ul, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > >, std::vector<int, std::allocator<int> > >&) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |   ->10.30% (379,246,208B) 0x341A9886: void rascal::CalculatorSphericalExpansion::compute_impl<(rascal::internal::CutoffFunctionType)0, (rascal::internal::RadialBasisType)0, (rascal::internal::AtomicSmearingType)0, (rascal::internal::OptimizationType)0, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >(std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |     ->10.30% (379,246,208B) 0x341AC484: void rascal::CalculatorSphericalExpansion::compute_by_radial_contribution<(rascal::internal::CutoffFunctionType)0, std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > > >(std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >&) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |       ->10.30% (379,246,208B) 0x341AFAD3: void rascal::CalculatorSphericalExpansion::compute<std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > > >(std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >&) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |         ->10.30% (379,246,208B) 0x341B554D: void rascal::CalculatorSphericalInvariants::compute_impl<(rascal::internal::SphericalInvariantsType)1, 0, rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >(std::shared_ptr<rascal::AdaptorStrict<rascal::AdaptorCenterContribution<rascal::AdaptorNeighbourList<rascal::StructureManagerCenters> > > >) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |           ->10.30% (379,246,208B) 0x341B7690: void rascal::CalculatorSphericalInvariants::compute<rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict> >(rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict>&) (in /local/scratch/fraux/local/lib/python3.6/site-packages/rascal/lib/_rascal.cpython-36m-x86_64-linux-gnu.so)
| |             ->10.30% (379,246,208B) 0x3416CEBD: void pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, rascal::CalculatorSphericalInvariants, rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict>&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release> >(void (rascal::CalculatorSphericalInvariants::*)(rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(rascal::CalculatorSphericalInvariants*, rascal::ManagerCollection<rascal::StructureManagerCenters, rascal::AdaptorNeighbourList, rascal::AdaptorCenterContribution, rascal::AdaptorStrict>&)

Most memory is allocated by rascal::CalculatorSphericalInvariants::initialize_per_center_powerspectrum_soap_vectors(3171934400 B, or 2.5 GiB), with the next contributor being CalculatorSphericalExpansion::initialize_expansion_environment_wise (379246208 B or 360 MiB)

max-veit commented 3 years ago

Interesting, that's what actually allocates the memory for the SOAP power spectrum (and gradients IIRC). Are you sure you're correctly accounting for the memory required by the features?

max-veit commented 3 years ago

If you're sure, then this could point to a bug in the allocation routines (allocating too much memory...?)

Luthaf commented 3 years ago

Are you sure you're correctly accounting for the memory required by the features?

Features are 8.3MiB for 434 atoms. Considering 20 neighbors per atom, the gradients need 3 x 20 x 8.3MiB for storage, which is 498 MiB.

Running code like this

n_atoms = 434
neighbors = [set() for _ in range(n_atoms)]
for atom, neighbor in managers.get_gradients_info()[:, 1:3]:
    neighbors[atom].add(neighbor)

print(sum(len(n) for n in neighbors) / n_atoms, "neighbors in average")

Gives me 43.77880184331797 neighbors in average, which should end up to 1GiB of memory.


I appreciate a second look at this, I might be overlooking something!

max-veit commented 3 years ago

Hmm, do you have a way of getting info on the size of the gradients entries themselves? There might be some complications with species cross terms that could have you ending up with more gradients entries than just the 3 n_neigh n_atoms * n_max*2 (l_max + 1) (n_species (n_species + 1)) / 2 terms that your calculation above suggests.

max-veit commented 3 years ago

The easy way to check this would be to try a single-species system, where you have no species cross terms, and see if your estimate is more accurate.

Luthaf commented 3 years ago

Continuing debugging, it looks like there are duplicated entries in the gradients:

import ase
from rascal.representations import SphericalInvariants

frames = [
    ase.Atoms(
        "CC",
        cell=[4.0, 4.0, 4.0],
        pbc=[True, True, True],
        positions=[
            [0.68081000, 3.08633000, 0.58394200],
            [0.07090640, 2.64372000, 0.14372900],
        ],
    )
]

for frame in frames:
    frame.wrap(eps=1e-10)

soap = SphericalInvariants(
    soap_type="PowerSpectrum",
    interaction_cutoff=3.5,
    max_radial=6,
    max_angular=6,
    gaussian_sigma_constant=0.3,
    gaussian_sigma_type="Constant",
    cutoff_smooth_width=0.5,
    radial_basis="GTO",
    normalize=False,
    compute_gradients=True,
)

managers = soap.transform(frames)

print(managers.get_gradients_info())

Outputs

# columns are: structure, atom, neighbor, species_atom, species_neighbor
[[0 0 0 6 6]
 [0 0 1 6 6]
 [0 0 1 6 6]
 [0 1 1 6 6]
 [0 1 0 6 6]
 [0 1 0 6 6]]

Notice how atom 0 appears twice as a neighbor of atom 1, and atom 1 appears twice as a neighbor of atom 0. Since we are using reduction (i.e. sum over neighbors) most of the time, I can see how this could work be working fine when computing kernels but use more memory than needed.

Luthaf commented 3 years ago

It looks like a lot of memory usage can be attributed to SOAP vectors normalization. Running the script from https://github.com/cosmo-epfl/librascal/issues/324#issuecomment-821246856 with normalize = True gives

memory before: 56.79 MiB
memory used: 3.46 GiB

but running it with normalize = False gives

memory before: 56.75 MiB
memory used: 1.669 GiB

So around 1.5 GiB of additional memory use when normalizing SOAP vectors. That's for a feature matrix with 437 rows/atoms and a gradient matrix with 157338 row/neighbours.

max-veit commented 3 years ago

Ok, this is starting to make me suspicious of this function: https://github.com/cosmo-epfl/librascal/blob/41896982bd0a64945f0609ada6fa6e12ef79baf0/src/rascal/representations/calculator_spherical_invariants.hh#L588-L593

Does the extra memory usage (when including normalization) only happen when computing gradients? And it's only for SOAP (SphericalInvariants, not SphericalExpansion) that you're seeing this extra memory usage, right? If so, then this update_gradients_for_normalization thing would be the next place to look.

Luthaf commented 3 years ago

Does the extra memory usage (when including normalization) only happen when computing gradients?

Yes. Without gradients, the used memory increases by a couple dozen of kilobytes only when normalizing, as I would expect.

And it's only for SOAP (SphericalInvariants, not SphericalExpansion) that you're seeing this extra memory usage, right?

For the SphericalInvariants, I only tested the power spectrum.

The spherical expansion also have some strange behavior. On the same dataset/hyper parameters, librascal uses 30MB when only computing the values of the spherical expansion, and 400 MB when doing both values and gradients. Rascaline uses the same 30MB when computing the values; but only 200MB for the gradients (out of these, 4MB are used to store the values, and 136MB for the gradients -- with sparse species storage).

So librascal uses twice as much memory when doing the gradients. There might be something fishy here (either rascaline doing something wrong or librascal overallocating), but it is much less of an issue overall.

agoscinski commented 3 years ago

Hello,

was able to pin point it to this line https://github.com/cosmo-epfl/librascal/blob/db2e2445d34c196c94731249061740123f9fbc28/src/rascal/representations/calculator_spherical_invariants.hh#L1373

Later when the gradients are resized using the key_list_grad the memory starts to differ between normalize True and False https://github.com/cosmo-epfl/librascal/blob/db2e2445d34c196c94731249061740123f9fbc28/src/rascal/representations/calculator_spherical_invariants.hh#L1393 It uses the the less sparse key pair_list instead of pair_list_grad when normalizing (less sparse for multiple species). My guess is that it is there to do more conveniently operations in update_gradients_for_normalization (the function Max posted) with soap_vector_N, but I don't understand this part of the code well.

The remaining memory difference between normalize=True and False (which is minimal in comparison to the effect of the above) is at least partially because of the storage of the normalization coefficients https://github.com/cosmo-epfl/librascal/blob/db2e2445d34c196c94731249061740123f9fbc28/src/rascal/representations/calculator_spherical_invariants.hh#L743-L744 I don't think we free that memory, but this is just a guess and again this not really significant.