Open effigies opened 1 year ago
If the file and directory classes inside fileformats are adopted by convention then this functionality is covered (+ magic number checking) when the objects are instantiated.
To avoid losing functionality, we could improve the @pydra.mark.task decorator to convert Paths into fields with an existence check.
I didn't quite follow this, what is a "field" in this case. As I think we discussed with @djarecka and @ghisvail on another thread, to work nicely with fileformats, paths need to be somehow "cast" to the expected fileformat class. Not sure if this is related to what you are suggesting
@effigies - I don't fully follow, what do you mean be "convert Paths into fields"?
As for the validation, I think at the beginning we assumed that we want to distinguish between output file that would be a string in the input/output_spec
and an existing input file that has a type of File
(or Directory
). I think nipype there is exist=True
for this?
Yeah, I'm not positive what words everyone uses. I'm pretty sure in_file: Path = attr.field()
is a thing we can do in an input spec, and when we make functions, we would just do def f(in_file: Path)
and let @task
do the work of converting the function signature to an input spec. If we stop special-casing File
/Directory
and doing existence checks in BaseSpec
, then when we do the conversions, the attr.field()
object that we create would need to be given a validator.
Not sure if this is clear...
Sorry, still not 100% clear to me (but I'm struggling after an interrupted night's sleep).
I like the idea of pydra.mark.task creating an attrs
class with attrs.field
s for the input spec if it doesn't already (and that is what you are suggesting).
However, if you use a fileformats
class then existence+format validation will occur before the object is passed to a the input field (on initialisation of the File/Directory object) and validation in the input spec wouldn't be necessary. The only issue would be output/downstream-input validation, but this could be handled by casting the outputs generated by output_file_template
and similar to the explicitly defined output types, which would trigger validation at that point.
The only issue would be output/downstream-input validation, but this could be handled by casting the outputs generated by output_file_template and similar to the explicitly defined output types, which would trigger validation at that point.
On that note, it still bothers me that a field with output_file_template
must be of type string. It is not any string, it's a pathlike string representing a future file.
Under click
or typer
, this would be declared as a click.Path(exists=False, file_okay=True, dir_okay=False, ...)
or a click.File(lazy=True)
.
Right now we have a weird special case for files and directories:
https://github.com/nipype/pydra/blob/426564eca3efa7d840fdca33bdbef6a1c88e2ffc/pydra/engine/specs.py#L211-L235
These would make more sense as validators on the fields themselves, rather than special-casing inside Pydra, as right now we are only capable of handling File/Directory and list of File/Directory.
To avoid losing functionality, we could improve the
@pydra.mark.task
decorator to convertPath
s intofields
with an existence check. And @tclose'scmd_arg
suggestion could probably help out by making these implicit, as well.