pyiron / FAQs

General question board for pyiron users
3 stars 0 forks source link

layers in parsers #11

Open samwaseda opened 9 months ago

samwaseda commented 9 months ago

Today in the pyiron Q&A session we realized the need for a parser to have multiple layers:

  1. File to content
  2. Content to functions
  3. Functions to data

In today's discussion, we became aware of the need for the first step, mainly because we would like to be able to load the content of a file after its compression. Currently the loading takes place inside individual parsers with no standard method (np.loadtxt, with open etc.), which may or may not work for compressed files. Instead of doing so, we create something like FileObject, which can take both a plain file and a compressed file, and convert it into plain text.

The last layer came into question in a discussion that I had with @jan-janssen, basically following the idea that has already been implemented in interactive jobs, to make it possible for the user to choose the input and output data that they would like to use.

samwaseda commented 9 months ago

I made a complete example:

from pathlib import Path
import bz2

# No guarantee that this one works - copied from ChatGPT
def read_file_inside_bz2(compressed_file_path, file_inside_bz2):
    with bz2.open(compressed_file_path, 'rt') as bz2_file:
        # Iterate through the lines in the bz2 file
        for line in bz2_file:
            # Check if the line contains the file_inside_bz2
            if line.strip() == file_inside_bz2:
                # Read the content of the file_inside_bz2
                content = next(bz2_file).strip()
                return content

class Parser:
    def __init__(self, content):
        self.content = content

    def get_energy(self):
        # parsing using self.content
        return energy

    def get_forces(self):
        # parsing using self.content
        return forces

# Step No. 1 in my list above
def file_to_content(file_name, file_path):
    if Path(file_path).suffix.lower() == "bz2":
        return read_file_inside_bz2(file_path, file_name)
    else:
        with open(Path(file_path) / Path(file_name), "r") as f:
            return f.read()

# Step No. 2 in my list above
def content_to_functions(content):
    parser = Parser(content)
    return {
        "energy": parser.get_energy,
        "forces": parser.get_forces
    }

# Step No. 3 in my list above
def functions_to_data(data_functions):
    data = {tag: func() for tag, func in data_functions.items()}