gruenewald-lab / CGsmiles

Coarse-Grained Smiles (CGsmiles) for representing abitrarily complex molecules using a compact line notation
5 stars 2 forks source link

Bigsmiles #21

Open fgrunewald opened 2 months ago

fgrunewald commented 2 months ago

@pckroon small additional feature; read bigsmiles and make a cgsmiles string out of it. This small utlity does not cover all the BigSmiles syntax but a sufficently good amount. This is useful for ML operations as well as setting up itp files from BigSmiles.

The idea is simple: each stochastic object becomes a node in a hyper resolution graph with the fragments being the all-atom blocks. It does not support nesting, because I don't care enough to make this fully recursive. But this parser is good enough for the 5000 bigsmiles strings in the block copolymer database.

I'm thinking if it is worth to do the other way around as well. But I think not every cgsmiles string can be converted to bigsmiles.

fgrunewald commented 2 months ago

@pckroon Agreed that cgsmiles to bigsmiles seems strange and not sure the use case anyway

that was my original idea but then it means having to interpret the bigsmiles thing making it a graph to make it a cgsmiles string. I think that would introduce a lot of code, or required an adopted cgsmiles parser that also handles bigsmiles or am I missing your point?

By the way I'm still thinking about adding one more level or resolution though:

Consider a bigsmiles like this, where A,B,C,D are four different residues but for sake of brevity I do not write them out.

T1{[#A][#B]}{[#C][#D]}T2

This bigsmiles describes a random block of AB followed by a random block of CD. At the moment the meta graph is just A, B, C, D which is fine. But you could also argue to make each stochastic object a node and blocks within that object another level node so that you get a tree with two resolutions:

[#T1][#st_obj1][#st_obj2][#T2].{st_obj1=[#A][#B],st_obj2=[#C][#D],#T1=[#T1],#T2=[#T2]}.{#A=[...],#B=[...],#C=[...]}

For taking this string and making it an actual polymer it does not matter, because you'd just take the last resolution anyways and then define the appropriate probabilities to connect. But for ML applications it could be interesting when one does message passing on meta nodes. Not sure it is worth the effort? What do you think

pckroon commented 2 months ago

Parsing bigsmiles to an internal CGsmiles object (networkx graph) does indeed require a separate parser, but does offer advantages in that you (well, someone) can then use the cgsmiles code. In addition, you can then use the cgsmiles writer, which makes testing easier.

As for the added resolution layer, the more complete the parser the happier I get (since I'm a completionist); whether it's worth the effort I don't know, you'd need to ask the ML people.