ybracke / transnormer-data

Data preparation for the transnormer project (https://github.com/ybracke/transnormer)
0 stars 0 forks source link

Improve layer handling #23

Closed ybracke closed 3 months ago

ybracke commented 6 months ago

Improve, source and target layer handling in Modifier objects. Current behavior:

    def __init__(self, layer: str = "norm", mapping_files: List[str] = []) -> None:
        """
        Example implementation of a type replacement modifier

        This modifier replaces types on the tokenized version of the target layer
        (here "norm_tok") and propagates the changes to the raw version ("norm")

        """

        # Keys in the sample dictionary
        valid_layers = {"norm", "orig"}
        if layer not in valid_layers:
            raise ValueError(f"ReplaceToken1to1Modifier: layer must be one of{valid_layers}")
        self.raw = f"{layer}"
        self.tok = f"{layer}_tok"
        self.ws = f"{layer}_ws"
        self.spans = f"{layer}_spans"

Possible improvements: