Dataloaders need to be set correctly or it will fail.

mthorrell / gboost_module

Gradient Boosting Modules for pytorch

MIT License

9 stars 0 forks source link

Dataloaders need to be set correctly or it will fail. #9

Closed bbeechler closed 3 months ago

bbeechler commented 3 months ago

If Dataloader is not initialized with "drop_last=True" it will pass less than the batch size and crash the module. Maybe just add this to docs and have this line throw an error

mthorrell commented 3 months ago

@bbeechler , thanks for the note. Right now, proper batch training isn't supported (ie feed in one batch, then the next, then the next). You need to feed in all the data to start. This was removed mostly to speed things up with cached predictions (if each new batch truly is net new, then you can't cache predictions and prediction is O(number of trees).

I'll add an assertion so it can at least fail more gracefully, and I'll make a new ticket for proper batch training.

mthorrell commented 3 months ago

I added an assertion and I added the feature request for supported batch training #12. I'll close this issue.