Open abfleishman opened 6 years ago
Hi @abfleishman ! Could you please clarify the scenario a bit?
As I understand it now, then a new model is trained is predicts on all of the images in the blob storage that it is pointed at. I have been starting with maybe 1000 images and training and prediction go very quickly. Then I have been adding more images, let's say another 1000, so there are 2000 images in blob storage. If we are only using the workflow for generating new training data, we do not need to predict for the first already labeled 1000 images since we do not need to review them again (hopefully) and we can save time by only predicting on the new 1000 images. this gets more pronounced when the numbers are larger, of course, let's say 10,000 images that have been labeled and 2000 new unlabeled images. Does that make sense?
When there are a lot of images in the blob storage training is fast but prediction is slow. It would be a nice option to only predict on unlabeled images to improve speed and be able to iterate faster and an option to only do metrics on the test set / not predict on the rest of the labeled images.