huggingface / lerobot

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Apache License 2.0
6.54k stars 582 forks source link

Log validation loss for real datasets #250

Closed tlpss closed 1 month ago

tlpss commented 3 months ago

If I am correct, there is no validation functionality provided for real datasets (for which there is no gym environment available during training).

I think it would be useful to log the validation loss (which would also require creating/specifying a validation split)

Next to providing some additional insight into the training, this could also be used to manage the checkpoints.

Cf the ACT repo.

alexander-soare commented 3 months ago

We have a mechanism for splitting the data https://github.com/huggingface/lerobot/pull/158 but we haven't yet worked it into the training script. It hasn't been a priority for us for a couple of reasons, and might still be on the backlog. If you'd like to contribute something, that could be nice and we'd be very grateful. One ask would be to see how it can be designed to be minimally disruptive to the current setup (defaults to being off and is just a flag / config param to turn off).

tlpss commented 3 months ago

@alexander-soare thanks for the reply.

I am willing to implement this, as it would help in my current project and seems like a nice contribution.

I agree that it should be as simple as possible. I would propose to add an additional flag to the config files?

atm there is the eval_freq param that determines when to evaluate on a gym environment, we could have an additional parameter (name TBD) that determines when to calculate the loss(es) on a separate validation dataset? This would then be a third if statement in the 'evaluate_and_log_ckpt' method.

The second question is how to specify the validation dataset. Shoud we split at runtime and have the user pass on a split ratio? or should the user be able to specify a validation dataset?

alexander-soare commented 3 months ago

@tlpss yes I agree we need another flag and another if-block. We should probably look to other repos for terminology: I'm thinking eval vs validation but not totally sure.

I think both of the behaviours: split vs provide dataset are interesting. I'd say go for the one you definitely need as a use case, as that way we'll get some immediate road testing before we even merge the PR.

tlpss commented 3 months ago

@alexander-soare

Even though validation loss isn't as 'meaningful' for imitation learning (due to multimodality I presume?) as for regular supervised learning, I'll go ahead an submit a small PR for this.

I'll use validation for now and do a runtime split.