Automate identification of output file paths

jataware / domain-model-examiner

The goal of this process is to perform machine reading over the model codebase in order to automatically extract key metadata.

MIT License

1 stars 0 forks source link

Automate identification of output file paths #3

Closed brandomr closed 3 years ago

brandomr commented 3 years ago

We need a way to try to identify where model output files are being written to disk. Ideally we would identify the relative path to the output so that we can gain insight into where these would be stored within a model Docker container (once one is made in Clouseau).

GoogleSheets commented 3 years ago

Included in latest push, but may require more work to ensure all outputs are extracted in Python and R.

brandomr commented 3 years ago

For Pythia this goes a bit haywire; like we discussed showing where we think an output is being written is helpful even if we can't get the actual path. Finding a cleaner way to represent this in the YAML would be good.

GoogleSheets commented 3 years ago

Reorganized the yaml output to:

output_files
  filename
    line
    path
    write
  filename
    line
    path
    write

Parsing the code that writes files in Python, Julia and R is still a mess as there is a lot of flexibility how to write to file. This results in a bit of spaghetti code, and there are plenty of uncovered outlying scenarios. It is a time vs. reward issue on whether to optimize this.