jvalue / jayvee

Jayvee is a domain-specific language and runtime for automated processing of data pipelines
https://jvalue.github.io/jayvee/
150 stars 15 forks source link

[BUG] Cannot import from "above" the executed file #590

Closed TungstnBallon closed 4 months ago

TungstnBallon commented 5 months ago

Steps to reproduce

  1. Create this structure:
    .
    ├──  src
    │  ├──  data.csv
    │  └──  mainModel.jv
    └──  cannot-be-imported.jv

cannot-be-imported.jv

publish transform noop {
    from someText oftype text;
    to sameText oftype text;

    sameText: someText;
}

src/mainModel.jv

use {
    noop
} from "../cannot-be-imported.jv";

pipeline pip {

    XTractor
        -> TextFile
        -> CSV
        -> Table
        -> Loader;

    block XTractor oftype LocalFileExtractor {
        filePath: "data.csv";
    }

    block TextFile oftype TextFileInterpreter { }
    block CSV oftype CSVInterpreter { }

    block Table oftype TableInterpreter {
        header: false;
        columns: [
            "c" oftype text,
        ];
    }

    block Loader oftype SQLiteLoader {
        table: "table";
        file: "out.sqlite";
    }
}

src/data.csv

"cell1"
  1. cd src/
  2. jv mainModel.jv

Description

Could not extract the AST node of the given model.


This is also true for a structure like this:

. ├── a │ └── mainModel.jv └── b └── cannot-be-imported.jv

georg-schwarz commented 5 months ago

How is the behavior if you execute the file from one level above? E.g. jv src/mainModel.jv?

TungstnBallon commented 5 months ago

Sadly, the issue persists.

georg-schwarz commented 5 months ago

Traced it back to this line of code: https://github.com/jvalue/jayvee/blob/d4c8559db5388de61e778d3b3eafa6ce9ac9ea43/libs/interpreter-lib/src/parsing-util.ts#L44-L49

I think we'd have to pass the working directory as an argument instead of assuming that the referenced file is at the root level of the working dir.

Still, a design decision we have to make @rhazn @joluj: Do we want to prevent imports from anywhere above (i) the main file or (ii) the dir where the cli was called for security reasons?

rhazn commented 5 months ago

I wouldn't care and just allow dir traversal freely, I doubt there is much of a security issue there.

joluj commented 5 months ago

I'd vote for (ii). I expect that the following monorepo-like folder structure will be used which is not possible with (i):

├──  libs
│  └──  a.jv
│  └──  b.jv
└──  apps
   └── cool-app-1.jv
   └── cool-app-2.jv
rhazn commented 5 months ago

I think ii is not very nice because it will make programs behave differently depending on the working dir of the interpreter. That is confusing in Python and I'd prefer not to introduce it in JV.

joluj commented 5 months ago

Ah true, then I'd vote for your initial idea of allowing dir traversal freely :+1:

georg-schwarz commented 5 months ago

From a technical point of view, it is difficult not to restrict directory traversal - at least with our current implementation.

The JayveeWorkspaceManager needs to be initialized with all the workspaces we want the language-server to be aware of. Passing the machine's root is probably not a good idea performance-wise.

That means we'd have to evaluate how to dynamically add (and remove) documents to the WorkspaceManager, depending on where imports point to. This is a larger refactoring, IMO. I'm not sure how well-supported that would be with Langium, but I assume there would be a way.

In conclusion, we should create a feature ticket, IMO, as it exceeds the scope of a bug fix by far and introduces new behavior (having a predefined workspace has been the intended behavior so far). Besides, we should implement a fix for this ticket that allows (ii) traversing up until the dir is reached where jv was executed from.

rhazn commented 5 months ago

Yeah, totally fine I think. I wasn't trying to argue for how a fix might look, more for a long term direction.

georg-schwarz commented 5 months ago

Perfect! Created #593 for a better long-term solution and will soon push a quick-fix for the bug.