nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
119 stars 57 forks source link

splits on a dynamic inputs/output #716

Closed satra closed 8 months ago

satra commented 8 months ago

@tclose - i'm trying to update pydra_ml to work with the new master. how do i use dynamic inputs/outputs in splits?

the first one takes splits from inputs: https://github.com/nipype/pydra-ml/blob/8fd4536de3a5431219f924897c09aa905c3bc89a/pydra_ml/classifier.py#L74

the second one takes splits from a generated output: https://github.com/nipype/pydra-ml/blob/8fd4536de3a5431219f924897c09aa905c3bc89a/pydra_ml/classifier.py#L105

effigies commented 8 months ago

The inputs need to be set in the split:

Case 1:

 def gen_workflow(inputs, cache_dir=None, cache_locations=None):
+    clf_info = inputs.pop('clf_info')
+    permute = inputs.pop('permute')
     wf = pydra.Workflow(
         name="ml_wf",
         input_spec=list(inputs.keys()),
         **inputs,
         cache_dir=cache_dir,
         cache_locations=cache_locations,
         audit_flags=AuditFlag.ALL,
         messengers=FileMessenger(),
         messenger_args={"message_dir": os.path.join(os.getcwd(), "messages")},
     )
-    wf.split(["clf_info", "permute"])
+    wf.split(clf_info=clf_info, permute=permute)

Case 2:

     wf.add(
         train_test_kernel_pdt(
             name="fit_clf",
             X=wf.readcsv.lzout.X,
             y=wf.readcsv.lzout.Y,
             train_test_split=wf.gensplit.lzout.splits,
-            split_index=wf.gensplit.lzout.split_indices,
             clf_info=wf.lzin.clf_info,
             permute=wf.lzin.permute,
         )
     )
-    wf.fit_clf.split("split_index")
+    wf.fit_clf.split(split_index=wf.gensplit.lzout.split_indices)
satra commented 8 months ago

thanks @effigies - i did do that but was using a list of splits still as the first argument, which was incorrect.

tclose commented 8 months ago

thanks @effigies - i did do that but was using a list of splits still as the first argument, which was incorrect.

Hi @satra, if I understand what you mean, it should still work with the names of the fields to split in the first argument, it just isn't necessary

satra commented 8 months ago

i'll check again sometime, i think it may have been pydra version mismatch when i last tried. the current version is working there but breaking down during hashing (see #717).