Open csbrasnett opened 2 years ago
I have a hacky solution for this now, but maybe it'd be good to include it at some point.
Reading in an itp file first requires a list of all the lines in the itp file, which includes the header lines. Something like:
def header_parser(file_lines):
#1) add the header lines to a list
header = True
header_lines = []
for i in file_lines:
if 'moleculetype' in i and header == True:
header = False
elif header == True:
header_lines.append(i)
#2) remove the '; ' from the start of the line and '\n' from the end, they'll get written back in when writing out.
lines_out = []
for i in header_lines:
if len(i) > 1:
lines_out.append(i[2:-1])
else:
lines_out.append(i)
return lines_out
will do the job, so that lines_out can be passed in some form to write_molecule_itp
later on. As these lines contain the information about the secondary structure too, they can be used for that as well.
In the current code structure for the parser this will be very hard to include, since 1) the parser is a mess, and 2) you intend to parse comments; these get stripped out at a very early stage of parsing.
This is something we could address when we (finally) redo the parser(s) (again). An option to preserve comments would be valuable.
Even better would be to write our itp header with specifically formatted "comments" describing this kind of metadata. For example, in comments like ;METADATA SS=....
, which would facilitate parsing.
@pckroon I already made a PR for a comment parsing utility in case you want to see how it could be done. #460 I'm already using it in some production packages as subclass to the current parser. Works pretty well.
The header with citations is more tricky but should be doable by simply dumping it into meta of molecule. Anything else like the digestion part I would argue should be done by another function not the parser (i.e. like with interactions we only parse not interpret).
I realised I buried this within #483, and might be quite ambitious, but it would be useful to read back in header information (eg. the secondary structure determined by DSSP) from an itp file generated by martinize2. It would also be useful for maintaining citation information in a file editing pipeline.
The second advance in this regard would be maintaining grouping of different intra-directive information, eg. whether bonds are backbone-backbone, or angles BB-SC-SC, etc. so that these could be edited selectively, or again, maintained when subsequently writing out.