juaml / junifer

Forschungszentrum Jülich Neuroimaging Feature Extractor
https://juaml.github.io/junifer
GNU Affero General Public License v3.0
14 stars 13 forks source link

[ENH]: Handling of relative paths for user-defined masks/parcellations #212

Open fraimondo opened 1 year ago

fraimondo commented 1 year ago

Are you requiring a new dataset or marker?

Which feature do you want to include?

Following the discussion in #127, we need a way of allowing the user to create an extension that registers a mask/parcel using a relative path to the mask (from the location of the .py extensions file).

Asking for absolute paths directly will not work in cases that extensions are stored in a repository, like juni-farm.

The issues comes with the queue command, as the .py file will be copied, but the data/parcellation/etc will not.

@LeSasse @synchon

How do you imagine this integrated in junifer?

I don't have a clear idea on how to do it, but I can imagine something like this.

  1. the .py file with the extension has a "list of resources" that need to be copied (maybe in a variable, or comment)
  2. the queue command: 2.1 parses the file 2.2 finds the list of resources 2.3 computes the absolute path 2.4 copies them to a data directory in the junifer_jobs directory.

One problem that this issue has is that the .py file needs to somehow be modified so the path of the resources, after the queue command, are changed to the new data directory in the junifer_jobs directory.

Maybe some function junifer_path('../relative/path/to/data.nii.gz', __FILE__) can take care of solving the relative path. This function will first check in the ./data directory before using Path(__FILE__).parent.

Other more strict but easier to implement option is that each extension has a data directory next to it that can be copied in the junifer jobs. The problem with this approach is that users might end up repeating data in repositories for each extension they create.

Do you have a sample code that implements this outside of junifer?

No response

Anything else to say?

No response

LeSasse commented 1 year ago

Actually, maybe it makes sense to provide a YAML interface for the register_* functions, such that one puts in the YAML sth like:

register_parcellations:
  name: CustomParcellation
  path: path/to/parcellation.nii.gz
  parcels_labels: path/to/labels.txt

Then one can treat this as a relative path from YAML similar to other things, but not sure if this will then end up cluttering the YAML and will also not solve the problem for other extensions that still may or may not rely on relative paths to some degree. Perhaps something like allowing user made script-based extensions to use command line arguments. For example one could make a python script to register the parcellation like:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("parc_path")

# just pseudocodeish
register_parcellation(parc_path)

The arguments could then be parametrised in the YAML, so that paths can be treated as relative from there:

with:
  - my_parc_extension.py: [path/to/parc, potential_other_cmd_arg]

Or one could impose some constraint, like, the argument could be a path from yaml to a directory that contains all of the extension data ressources. Just a thought, but not sure how good this idea is, both from a technical point of view, and from a user-friendliness point of view.

I think the other option, that is also a good option, just stick with the implementation and just document it really well and explictly, to prevent confusion early on.

fraimondo commented 1 year ago

These seems nice concepts that I have not considered. I will just think while I type.

The main goals of YAML are: 1) To provide code-less configurations. 2) Easy to use 3) Reproduciblity (the yaml is embedded in the metadata of the feature as a json).

Option 1) from your suggestion (adding a register_parcellations section), for me, goes against 3. Reproduciblity is limited as long as you use core junifer features. The moment that there's a with statement with a non-junifer module (e.g.: external .py or package), we can't guarantee what's in there. This also applies with masks/parcellation. The mask/parcellation is not known to junifer, so we can't reproduce this result: if you send me this yaml file, I can't run it.

Additionally, users that want their specific masks/parcellations should be able to create them. At this point, we consider them power users and assume that they have enough coding knowledge to write the .py file that registers the mask.

Option 2 is a bit more complicated. First, it allows the user to "code" within the yaml. Second, the items in the with statement are imported, not run.

Overall, we need to consider what the user is doing, and not only the feature we want to support.

My take is:

1) The parcellation is a common parcellation -> create an issue in github and we will add it. Adding common/published parcellations is not complicated. 2) The parcellation is made by a user -> the user is a power user and can write the extension.

1 and 2 also applies for masks.

Let's also keep in mind that we have pending a nice feature regarding masks: https://www.templateflow.org/

from templateflow import api as tflow
tflow.get('MNI152NLin6Asym', desc=None, resolution=1,
          suffix='T1w', extension='nii.gz')
PosixPath('/templateflow_home/tpl-MNI152NLin6Asym/tpl-MNI152NLin6Asym_res-01_T1w.nii.gz')
LeSasse commented 1 year ago

I agree with all the points above. I think if users that make extensions are considered power users they can be expected to understand the subtleties of the relative paths issue, so my leaning would be towards keeping the implementation as is (paths in the YAML relative from the YAML, paths in extensions relative from the junifer working directory), and simply summarising in the docs quite well how the paths are intended to be used in each situation.

synchon commented 1 year ago

From another POV:

fraimondo commented 1 year ago

I don't completely understand your approach @synchon

What we want to address here is relative paths in .py files that are included in a with statement. So lets forget about the rest.

Imagine this structure of a github repo in which juni-farm is added as a submodule in the external directory.

project_dir/
  junifer_yamls/
    my_yaml1.yaml  // adds ../external/masks/custom_mask_extension.py in the "with" section
    my_yaml2.yaml
  external/
    juni-farm/
      data/
        masks/
          this_custom_mask.nii.gz
      masks/
        custom_mask_extension.py  // registers ../data/masks/this_custom_mask.nii.gz

My take:

1) The user needs to user relative paths by calling a junifer function: junifer_path('../relative/path/to/data.nii.gz', __FILE__) in custom_mask_extension.py 2) The queue command then: 2.1) parses the .py file 2.2) look for lines with junifer_path 2.3) computes the absolute path for each resource 2.4) copies each resource to a data directory inside the corresponding jobdir 3) the junifer_path function needs to first check inside the data dir relative to the yaml location for the file. Otherwise, looks in Path(__FILE__).parent / '../relative/path/to/data.nii.gz'

synchon commented 1 year ago

I don't completely understand your approach @synchon

What we want to address here is relative paths in .py files that are included in a with statement. So lets forget about the rest.

Imagine this structure of a github repo in which juni-farm is added as a submodule in the external directory.

project_dir/
  junifer_yamls/
    my_yaml1.yaml  // adds ../external/masks/custom_mask_extension.py in the "with" section
    my_yaml2.yaml
  external/
    juni-farm/
      data/
        masks/
          this_custom_mask.nii.gz
      masks/
        custom_mask_extension.py  // registers ../data/masks/this_custom_mask.nii.gz

My take:

  1. The user needs to user relative paths by calling a junifer function: junifer_path('../relative/path/to/data.nii.gz', __FILE__) in custom_mask_extension.py
  2. The queue command then: 2.1) parses the .py file 2.2) look for lines with junifer_path 2.3) computes the absolute path for each resource 2.4) copies each resource to a data directory inside the corresponding jobdir
  3. the junifer_path function needs to first check inside the data dir relative to the yaml location for the file. Otherwise, looks in Path(__FILE__).parent / '../relative/path/to/data.nii.gz'

I got it correct the previous time and I see what you mean. My proposal was a bit holistic in a way:

  1. Basic single YAML file:
project_dir/
  my_yaml1.yaml  // adds ./custom_mask_extension.py in the "with" 
  custom_mask_extension.py  // registers ./this_custom_mask.nii.gz
  this_custom_mask.nii.gz
  README.md
  .juniferignore

.juniferignore:

README.md
  1. Basic multi YAML file:
project_dir/
  my_yaml1.yaml  // adds ./custom_mask_extension.py in the "with"
  my_yaml2.yaml
  custom_mask_extension.py  // registers ./this_custom_mask.nii.gz
  this_custom_mask.nii.gz
  README.md
  .juniferignore

.juniferignore:

README.md
  1. Advanced single YAML file:
project_dir/
  README.md
  junifer_yamls/
    my_yaml1.yaml  // adds ../external/juni-farm/masks/custom_mask_extension.py in the "with" section
  external/
    juni-farm/
      data/
        masks/
          this_custom_mask.nii.gz
      masks/
        custom_mask_extension.py  // registers ../data/masks/this_custom_mask.nii.gz

.juniferignore:

README.md
  1. Advanced multi YAML file:
project_dir/
  README.md
  junifer_yamls/
    my_yaml1.yaml  // adds ../external/juni-farm/masks/custom_mask_extension.py in the "with" section
    my_yaml2.yaml
  external/
    juni-farm/
      data/
        masks/
          this_custom_mask.nii.gz
      masks/
        custom_mask_extension.py  // registers ../data/masks/this_custom_mask.nii.gz

.juniferignore:

README.md
fraimondo commented 1 year ago

But with your approach a junifer config goes from a yaml file to a directory structure. This will mean that the queue command will end up copying the ENTIRE juni-farm. What if the relative path is even outside project_dir? Like from another project?

For me, it makes things too complicated. We need to consider the logic from our side, modifying the user's behaviour as less as possible.

synchon commented 1 year ago

But with your approach a junifer config goes from a yaml file to a directory structure. This will mean that the queue command will end up copying the ENTIRE juni-farm. What if the relative path is even outside project_dir? Like from another project?

For me, it makes things too complicated. We need to consider the logic from our side, modifying the user's behaviour as less as possible.

I see your point and have no arguments against it. For me, a documented directory structure brings order to chaos and keeps the user and devs on same page, else I have usually found it to not go down well. Those were my two cents.