marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
100 stars 45 forks source link

Reading in header information from itp file #486

Open csbrasnett opened 2 years ago

csbrasnett commented 2 years ago

I realised I buried this within #483, and might be quite ambitious, but it would be useful to read back in header information (eg. the secondary structure determined by DSSP) from an itp file generated by martinize2. It would also be useful for maintaining citation information in a file editing pipeline.

The second advance in this regard would be maintaining grouping of different intra-directive information, eg. whether bonds are backbone-backbone, or angles BB-SC-SC, etc. so that these could be edited selectively, or again, maintained when subsequently writing out.

csbrasnett commented 2 years ago

I have a hacky solution for this now, but maybe it'd be good to include it at some point.

Reading in an itp file first requires a list of all the lines in the itp file, which includes the header lines. Something like:


def header_parser(file_lines):

    #1) add the header lines to a list    
    header = True
    header_lines = []
    for i in file_lines:
        if 'moleculetype' in i and header == True:
            header = False
        elif header == True:
            header_lines.append(i)

    #2) remove the '; ' from the start of the line and '\n' from the end, they'll get written back in when writing out.
    lines_out = []
    for i in header_lines:
        if len(i) > 1:
            lines_out.append(i[2:-1])
        else:
            lines_out.append(i)

    return lines_out

will do the job, so that lines_out can be passed in some form to write_molecule_itp later on. As these lines contain the information about the secondary structure too, they can be used for that as well.

pckroon commented 2 years ago

In the current code structure for the parser this will be very hard to include, since 1) the parser is a mess, and 2) you intend to parse comments; these get stripped out at a very early stage of parsing.

This is something we could address when we (finally) redo the parser(s) (again). An option to preserve comments would be valuable. Even better would be to write our itp header with specifically formatted "comments" describing this kind of metadata. For example, in comments like ;METADATA SS=...., which would facilitate parsing.

fgrunewald commented 2 years ago

@pckroon I already made a PR for a comment parsing utility in case you want to see how it could be done. #460 I'm already using it in some production packages as subclass to the current parser. Works pretty well.

The header with citations is more tricky but should be doable by simply dumping it into meta of molecule. Anything else like the digestion part I would argue should be done by another function not the parser (i.e. like with interactions we only parse not interpret).