Open ageron opened 2 years ago
Hi ageron,
Thanks fro the feedback.
Yes, batch prediction does not guarantee order. Currently we are expecting our customer to join the output by themselves. We totally agree that indexed inputs is a better solution. The feature to support that would be open to limited customers in next week. And we plan to public release that in the coming several weeks. Please let us know if you are willing to try it out.
Thanks @weichungw , yes I'd love to try it out.
this is still an issue right? I don't think this is a feature request. Feels like a bug to me. makes the product unusable.
any updates on the passing in an index/key field?
thanks
I sill experience this issue with gemini.
I make requests like this:
aiplatform.BatchPredictionJob.create(
job_display_name=f"call_analysis_batch_{timestamp}",
model_name=f"projects/{voice_project_id}/locations/{voice_region}/publishers/google/models/{voice_model_name}",
instances_format="jsonl",
predictions_format="jsonl",
gcs_source=input_uri,
gcs_destination_prefix=output_uri,
sync=True,
)
lines in the output jsonl don't match the order of the input jsonl.
Environment details
google-cloud-aiplatform
version: 1.12.0Steps to reproduce
BatchPredictionJob
that outputs multipleprediction-results-xxxxx-to-xxxxx
files.Code example
This happens with official code examples such as sdk-custom-image-classification-batch.ipynb. The relevant part of the code is this:
Firstly,
os.walk()
does not guarantee the order. In practice, it seems to respect the order, but it's brittle to count on this.Secondly, and more importantly, I've run into cases where the files were not in the same order as the inputs. I would get 7% accuracy on MNIST, then by just reversing the order of the prediction files, I would get 100%.
Thirdly, I haven't tested it but I suspect that the order would also be wrong if there's any error on any instance.
Lastly, the inputs may sometimes be large, and it's not efficient to include them in the predictions. I would much rather have an input identifier, such as its source file and its line index.