google / ml-metadata

For recording and retrieving metadata associated with ML developer and data scientist workflows.
https://www.tensorflow.org/tfx/guide/mlmd
Apache License 2.0
616 stars 145 forks source link

Data too long for column 'string_value' at row 1 #165

Closed mayankanand007 closed 2 years ago

mayankanand007 commented 2 years ago

Hey team, thanks for working on such a useful library. I was wondering if I could get some help on an error I’m facing.

details = "mysql_query failed: errno: 1406, error: Data too long for column 'string_value' at row 1"
        debug_error_string = "{"created":"@1662583283.098814427","description":"Error received from peer ipv4:10.208.67.78:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"mysql_query failed: errno: 1406, error: Data too long for column 'string_value' at row 1","grpc_status":13}

I’m trying to log my Teradata SQL query to MLMD, and I have defined the query property to be of type STRING or string_value as mentioned in the documentation. Is it possible to somehow define it as LONGTEXT or some other field that can accommodate larger strings? Also, what is the length that string_value can currently accommodate?

Thanks!​

mayankanand007 commented 2 years ago

it looks like there's internally some truncation going on within MLMD (see here) and the character limit for a string_value seems to be 65536, this makes me think that there's probably no way to define a LONGTEXT column. Can the team help me confirm that?​

BrianSong commented 2 years ago

Hi @mayankanand007, currently MySQL backend use MEDIUMTEXT to store the string_value column in Property table. [1]

In MLMD v7, we modified string_value to use MEDIUMTEXT instead of TEXT to persist property value up to 16MB (16,777,215 bytes) - current limit. The 64 KB (65536 bytes) is for previous TEXT limit. See [2] for more details.

[1] https://github.com/google/ml-metadata/blob/master/ml_metadata/util/metadata_source_query_config.cc#L2163 [2] https://stackoverflow.com/questions/13932750/tinytext-text-mediumtext-and-longtext-maximum-storage-sizes

mayankanand007 commented 2 years ago

Thanks, @BrianSong! would you be able to link to me the right version I should be upgrading to? Currently, when I do pip show ml-metadata, I got the following response.

Name: ml-metadata
Version: 0.22.1
Summary: A library for maintaining metadata for artifacts.
Home-page: https://github.com/google/ml-metadata
Author: Google LLC
Author-email: tensorflow-extended-dev@googlegroups.com
License: Apache 2.0
Location: /data/user/mayank/miniconda3/envs/dev/lib/python3.7/site-packages
Requires: absl-py, protobuf, six, tensorflow
Required-by: mylib

EDIT: I think I found the right version: https://github.com/google/ml-metadata/releases/tag/v1.1.0, thanks a lot! for your help, I will close this issue.