Move file existence checks into field validators

nipype / pydra

Pydra Dataflow Engine

https://nipype.github.io/pydra/

Other

119 stars 57 forks source link

Move file existence checks into field validators #649

Open effigies opened 1 year ago

effigies commented 1 year ago

Right now we have a weird special case for files and directories:

https://github.com/nipype/pydra/blob/426564eca3efa7d840fdca33bdbef6a1c88e2ffc/pydra/engine/specs.py#L211-L235

These would make more sense as validators on the fields themselves, rather than special-casing inside Pydra, as right now we are only capable of handling File/Directory and list of File/Directory.

To avoid losing functionality, we could improve the @pydra.mark.task decorator to convert Paths into fields with an existence check. And @tclose's cmd_arg suggestion could probably help out by making these implicit, as well.

tclose commented 1 year ago

If the file and directory classes inside fileformats are adopted by convention then this functionality is covered (+ magic number checking) when the objects are instantiated.

To avoid losing functionality, we could improve the @pydra.mark.task decorator to convert Paths into fields with an existence check.

I didn't quite follow this, what is a "field" in this case. As I think we discussed with @djarecka and @ghisvail on another thread, to work nicely with fileformats, paths need to be somehow "cast" to the expected fileformat class. Not sure if this is related to what you are suggesting

djarecka commented 1 year ago

@effigies - I don't fully follow, what do you mean be "convert Paths into fields"?

As for the validation, I think at the beginning we assumed that we want to distinguish between output file that would be a string in the input/output_spec and an existing input file that has a type of File (or Directory). I think nipype there is exist=True for this?

effigies commented 1 year ago

Yeah, I'm not positive what words everyone uses. I'm pretty sure in_file: Path = attr.field() is a thing we can do in an input spec, and when we make functions, we would just do def f(in_file: Path) and let @task do the work of converting the function signature to an input spec. If we stop special-casing File/Directory and doing existence checks in BaseSpec, then when we do the conversions, the attr.field() object that we create would need to be given a validator.

Not sure if this is clear...

tclose commented 1 year ago

Sorry, still not 100% clear to me (but I'm struggling after an interrupted night's sleep).

I like the idea of pydra.mark.task creating an attrs class with attrs.fields for the input spec if it doesn't already (and that is what you are suggesting).

However, if you use a fileformats class then existence+format validation will occur before the object is passed to a the input field (on initialisation of the File/Directory object) and validation in the input spec wouldn't be necessary. The only issue would be output/downstream-input validation, but this could be handled by casting the outputs generated by output_file_template and similar to the explicitly defined output types, which would trigger validation at that point.

ghisvail commented 1 year ago

The only issue would be output/downstream-input validation, but this could be handled by casting the outputs generated by output_file_template and similar to the explicitly defined output types, which would trigger validation at that point.

On that note, it still bothers me that a field with output_file_template must be of type string. It is not any string, it's a pathlike string representing a future file.

Under click or typer, this would be declared as a click.Path(exists=False, file_okay=True, dir_okay=False, ...) or a click.File(lazy=True).