Open AliHaiderAhmad001 opened 1 month ago
@AliHaiderAhmad001 The len
call seems to be inside your own user defined function. Could you take a look here to see what it is?
File "/tmp/ipykernel_542780/2278135304.py", line 29, in train_loop_per_worker
What happened + What you expected to happen
Description:
I encountered a TypeError when running TorchTrainer in a Ray Tune experiment. The error occurs due to an issue with StreamSplitDataIterator, which does not have a defined len() method. This issue causes the trial to fail and the training process to stop.
Error Traceback:
Expected Behavior:
I expected the training loop to handle the
StreamSplitDataIterator
properly without raising theTypeError
.Actual Behavior:
The
TypeError
prevents the training process from completing, as it attempts to calculate the length of an object that lacks a__len__
method.Detailed Traceback
Versions / Dependencies
Environment:
Ray version: 2.37.0 Python version: 3.10 OS: Ubuntu Hardware: CPU-based training on local laptop
Reproduction script
Related codes
----------- Trainer -----------------
------------------------ train_step -------------------------
----------------- train_loop_per_worker -------------------------
--------- load_data ----------
Issue Severity
High: It blocks me from completing my task.