google / ml-metadata

For recording and retrieving metadata associated with ML developer and data scientist workflows.
https://www.tensorflow.org/tfx/guide/mlmd
Apache License 2.0
626 stars 148 forks source link

GetArtifacts returns all artifacts without pagination #201

Open Sidebook opened 2 months ago

Sidebook commented 2 months ago

I'm using MLMD with MySQL DB + the official grpc server

When I don't give the option.max_result_size in a request, getArtifacts returns all artifacts without pagination

To reproduce it with grpcurl:

grpcurl -plaintext -proto ml_metadata/proto/metadata_store_service.proto localhost:13316 ml_metadata.MetadataStoreService/GetArtifacts

This is critical. When we have lots of artifacts (we're using Kubeflow), it will try to read all rows and it can kill a database instance. Is this expected behavior? Shouldn't it return 100 artifacts with nextPageToken?

Another issue is when I give max_result_size larger than 100, nextPageToken will vanish and it always returns 101 artifacts. I know I should not give a max_result_size larger than 100, but shouldn't it throw some errors?

https://github.com/google/ml-metadata/blob/master/ml_metadata/metadata_store/rdbms_metadata_access_object.cc#L2539

The if condition is only checking the lower bound.