Closed Tomcli closed 3 years ago
Here is the error message:
Traceback (most recent call last):
File "/usr/src/app/swagger_server/util.py", line 259, in invoke_controller_impl
results = impl_func(**parameters)
File "/usr/src/app/swagger_server/controllers_impl/pipeline_service_controller_impl.py", line 194, in list_pipelines
api_pipelines: [ApiPipeline] = load_data(ApiPipelineExtended, filter_dict=filter_dict, sort_by=sort_by,
File "/usr/src/app/swagger_server/data_access/mysql_client.py", line 678, in load_data
_verify_or_create_table(table_name, swagger_class)
File "/usr/src/app/swagger_server/data_access/mysql_client.py", line 359, in _verify_or_create_table
_validate_schema(table_name, swagger_class)
File "/usr/src/app/swagger_server/data_access/mysql_client.py", line 440, in _validate_schema
raise ApiError(err_msg)
swagger_server.util.ApiError: The MySQL table 'mlpipeline.pipelines_extended' does not match Swagger class 'ApiPipelineExtended'.
Found table with columns:
- 'UUID' varchar(255)
- 'CreatedAtInSec' bigint(20)
- 'Name' varchar(255)
- 'Description' varchar(255)
- 'Parameters' longtext
- 'Status' varchar(255)
- 'DefaultVersionId' varchar(255)
- 'Namespace' varchar(255)
- 'Annotations' longtext
- 'Featured' tinyint(1)
- 'PublishApproved' tinyint(1).
Expected table with columns:
- 'UUID' varchar(255)
- 'CreatedAtInSec' bigint(20)
- 'Name' varchar(255)
- 'Description' longtext
- 'Parameters' longtext
- 'Status' varchar(255)
- 'DefaultVersionId' varchar(255)
- 'Namespace' varchar(63)
- 'Annotations' longtext
- 'Featured' tinyint(1)
- 'PublishApproved' tinyint(1).
Delete and recreate the table by calling the API endpoint 'DELETE /pipelines_extended/*' (500)
After importing the quickstart catalog, the pipelines url is good. I can see all pipeline cards. The stress test is sending requests to get 2 of the pipeline cards repeatedly. After I ran the test for a while, the /apis/v1alpha1/pipelines
api started to sending back 500: internal server error
and I saw the error message above in the mlx-api pod. I always start with 1 pod for mlx-api, after importing the quickstart catalog, I scale up to 3 or more pods. Not sure if this is related to the issue.
Could there have been some pods that crashed? There is a code path in the MLX API that creates the pipelines
table if it does not exists. That code path was never used since we always find the pipelines
table already created by KFP or by the init_db.sh
script I wrote for the quickstart
with Docker Compose.
@ckadner when I rerun the init_db.sh job, the tables are recreated and everything works fine. But once we ran the stress test again, then the above error will pop up.
@ckadner when I rerun the init_db.sh job, the tables are recreated and everything works fine. But once we ran the stress test again, then the above error will pop up.
that seems to indicate that the MLX API pod does not find the pipelines
table and creates it with the wrong column length for the namespace
column. This should not happen unless there is a new MySQL instance which does not get initialized in time before the first call the the MLX API to GET
/apis/v1alpha1/pipelines
This may be an instance of inopportune timing due to the stress test scenario. If we need to support that, I can make changes to the MLX API. (In the Docker Compose setup I made the catalog upload service dependent on the MySQL service having finished the initialization.)
@ckadner I guess the problem is caused by the second or third pod when we scale up the mlx-api. Like I mentioned, we always do the quickstart import when the replicas=1, the 1st pod. Then I scale up the mlx-api to replicas=2 or 3. And this error will show up in 2nd and 3rd pod.
@ckadner I guess the problem is caused by the second or third pod when we scale up the mlx-api. Like I mentioned, we always do the quickstart import when the replicas=1, the 1st pod. Then I scale up the mlx-api to replicas=2 or 3. And this error will show up in 2nd and 3rd pod.
The 2nd or 3rd replica of MLX-API are connecting to the same (already initialized) MySQL database.
init_db.sql
was not being run. In Docker Compose the mysql
service gets initialized via "magic" volume:
volumes:
- ./init_db.sql:/docker-entrypoint-initdb.d/init_db.sql
MySQL uses this volume to find any initialization scripts and runs it, anything under /docker-entrypoint-initdb.d/
will be executed at startup of MySQL (PR #126)
mlx-api
pod will not find the pipelines
table and CREATE TABLE
with the incorrect namespace
column and internally remember that it created itmlx-api
pod with check and find the pipelines
table exists, but then they go on to verify the table schema and complain about the incorrect namespace
column lengthThe MLX API is not designed to be running with multiple replicas:
GET
request caching assumes single instance API deployment PR #140
Describe the bug
@yhwang can you describe the errors that you found?
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
Additional context
Add any other context about the problem here.