Closed UnixJunkie closed 2 years ago
Alternatively, if there is an automatic way to generate the kind of json files you have in reference_data/inputs, that might be similarly useful.
Hello, the easiest way to do this is probably using ASE, just read in the file and write it out to a different filename; ASE uses the extended xyz format by default when writing files of the .xyz
extension. As for the lattice, I think it's automatically padded to cell extent + 10 Å, but you might want to verify this.
The json files in the reference data are only for the pure C++ implementation (when you need to run without Python), so I don't think you need those.
An 'ase.io.read' followed by an 'ase.io.write' does not create the missing unit cell. There is this kind of comment which was added though, for each molecule:
Properties=species:S:1:pos:R:3 CHEMBL405398_1=T pbc="F F F"
You can also add the cell/lattice manually when loading your data:
frames = ase.io.read("your-file.xyz", ":")
for frame in frames:
frame.cell = [100, 100, 100] # set this to something big enough
frame.positions[:] += 50 # center the atoms in the cell
frame.pbc = [False, False, False] # disable periodic boundary conditions
From here, you can use ase.io.write()
to write an extended XYZ file with this cell information; or pass the frames directly to librascal.
Thanks a lot, I'll try this and let you know how it goes. Is there a users' mailing list for librascal? Maybe the bugtracker is not the best place for my beginner's questions.
One problem if I pass each frame directly to librascal is that the number of soap features will vary. While, I would like all my molecules to have the same number of SOAP features (though I don't know in advance what the dimensionality should be). I'll try working on a bigger computer, so that all molecules and their SOAP features can fit in memory.
You can specify manually the pool of chemical elements. Can't remember the syntax off the top of my head, but I believe there are examples. Something like global_species: [1,3, ....]
Even if I pass the global_species parameter to SphericalInvariants, the number of SOAP features is still varying:
# those are (num_atoms, num_SOAP_features) of the molecules I am reading in
Data matrix: (29, 2520)
Data matrix: (48, 3528)
Data matrix: (36, 2520)
Data matrix: (27, 2520)
Data matrix: (79, 2520)
Data matrix: (80, 2520)
Data matrix: (152, 3780)
Data matrix: (144, 1512)
Nb, not all molecules are with the same chemical composition; I am not working with frames from an MD simulation. Just distinct isolated molecules.
Can you share your full input & hyper-parameters?
Is there a users' mailing list for librascal? Maybe the bugtracker is not the best place for my beginner's questions.
That's fine for now, we don't have a lot of traffic. Otherwise, the discussion page on this repository would also be a good place for questions: https://github.com/lab-cosmo/librascal/discussions/categories/q-a
I'll share my test code as a PR.
You should be able to run it and get the following output:
./soap_test.py > test.out
Data matrix: (17, 2520)
Data matrix: (20, 1512)
Data matrix: (12, 2520)
Data matrix: (14, 2520)
Data matrix: (20, 2520)
Data matrix: (13, 1512)
Data matrix: (18, 2268)
Data matrix: (20, 2520)
Data matrix: (15, 2520)
Data matrix: (25, 1512)
Data matrix: (19, 1512)
Data matrix: (19, 1512)
Data matrix: (21, 1512)
Data matrix: (18, 2520)
Data matrix: (18, 2520)
Data matrix: (15, 1512)
Data matrix: (21, 1512)
Data matrix: (20, 1512)
Data matrix: (13, 2520)
Data matrix: (20, 2520)
While I understand the number of atoms is varying, I don't understand why this is the case for the number of SOAP features.
I don't understand how I could compare two atoms using a kernel if the atoms are not encoded with vectors of the same length.
solved thanks to feedback from the experts
Hello, It seems you are using an extended xyz file format. The second line for each molecule is a comment with some Lattice specification and other information. Is there a tool or function to automatically create such files given a "classic" xyz file? I am interested in a non repeating lattice (this is not a crystal, just an isolated molecule), big enough to hold the whole molecule, plus some margin on all axes, I guess. Thanks a lot, F.