samirelanduk / atomium

Python macromolecular parsing (with .pdb/.cif/.mmtf parsing and production)
https://atomium.bio
MIT License
102 stars 19 forks source link

Parsing the Rfree Value #15

Closed jonathanrd closed 6 years ago

jonathanrd commented 6 years ago

It would also be useful to parse the Rfree and number of reflections used to generate the Rfree from the PDB remarks.

FREE R VALUE FREE R VALUE TEST SET COUNT

Something like:

def extract_rfree(pdb_dict, lines):
    """Takes a ``dict`` and adds rfree information to it by parsing
    REMARK 3.
    :param dict pdb_dict: the ``dict`` to update.
    :param list lines: the file lines to read from."""

    remark_lines = get_lines("REMARK", lines)
    pattern = r"FREE R VALUE\s+:(.+)"
    for remark in remark_lines:
        if int(remark[7:10]) == 3 and remark[10:].strip():
            matches = re.findall(pattern, remark)
            if matches:
                try:
                    pdb_dict["rfree"] = float(matches[0].strip())
                    break
                except: pass
    else:
pdb_dict["rfree"] = None

def extract_rfree_count(pdb_dict, lines):
    """Takes a ``dict`` and adds rfree information to it by parsing
    REMARK 3.
    :param dict pdb_dict: the ``dict`` to update.
    :param list lines: the file lines to read from."""

    remark_lines = get_lines("REMARK", lines)
    pattern = r"FREE R VALUE TEST SET COUNT\s+:(.+)"
    for remark in remark_lines:
        if int(remark[7:10]) == 3 and remark[10:].strip():
            matches = re.findall(pattern, remark)
            if matches:
                try:
                    pdb_dict["freecount"] = float(matches[0].strip())
                    break
                except: pass
    else:
pdb_dict["freecount"] = None
samirelanduk commented 6 years ago

Agreed! Have implemented this in 0.10 now - will merge into master when 0.10 is released - probably in a week's time or so.

Cheers!