vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.48k stars 2.09k forks source link

Track complete parsing support in Vitess #8604

Open GuptaManan100 opened 3 years ago

GuptaManan100 commented 3 years ago

Description

Vitess has its own in-built SQL-parser which it uses to understand the query and represent as structs for further processing. As of now, a lot of MySQL structs are not parsed and result in syntax errors. This issue is to track the progress on adding parsing for such constructs

Work Done:

ritwizsinha commented 3 years ago

Hello @GuptaManan100 I have been trying out and understanding various projects in CNCF ecosystem and I am particularly interested in those related to databases and networking so I would like to get involved with this project and work on this issue. I would like to contribute to this project and increase my knowledge. I would be familiarising myself with terminology and get to know better about Vitess in the coming days. Any help appreciated.

GuptaManan100 commented 3 years ago

@ritwizsinha good to hear! This project is part of the LFX program. More information at https://github.com/cncf/mentoring/tree/main/lfx-mentorship. We would love to see you apply in it. All the best!

ritwizsinha commented 3 years ago

Thanks @GuptaManan100 I would like to apply to it, what all do I need to provide to improve my applications, do I need to have previous knowledge of Vitess, or previous contributions in this repository to get selected? Do I have to write out an implementation proposal for this feature(I am asking because I heard that every org did it differently)? What all would you like to see in the application ?

GuptaManan100 commented 3 years ago

@ritwizsinha No, we do not expect any previous knowledge of Vitess from the applicants. There is no explicit requirement for a implementation proposal either, but it would be great if you would be able to do a gap analysis of what all Vitess parser is missing from MySQL. I have written some of the constructs that I know that Vitess does not parse correctly

derekperkins commented 3 years ago

There are still a significant number of 8.0 functions without support: https://github.com/vitessio/vitess/issues/4099

ritwizsinha commented 3 years ago

Okay thanks @GuptaManan10 and @derekperkins I will look into it

aribalam commented 3 years ago

Hi @GuptaManan100 , a student from IIT Kharagpur here. I would like apply for this project under the LFX mentorship program. I have read the previous comments and the requirements you mentioned. Looking forward to it. :)

ritwizsinha commented 3 years ago

@GuptaManan100 I have read bit of the documentation of Vitess and set it up locally using Docker, now to identify which constructs are missing, I am thinking of trying to execute the methods you mentioned above, for example for partitioning. then for collation etc. Is their any document which specifies what all the Vitess parser recognizes and parses or we have to check it all manually?

GuptaManan100 commented 3 years ago

@ritwizsinha Have a look at https://github.com/vitessio/vitess/blob/main/go/vt/sqlparser/sql.y. This file is our yacc parser configuration. So this is the authoritative source on what we parse and what we don't. Within the same package we have a parse_test.go file which has the parsing tests.

ritwizsinha commented 3 years ago

@GuptaManan100 I wrote a quick script to match all the functions in here https://dev.mysql.com/doc/refman/8.0/en/built-in-function-reference.html which aren't present in the https://github.com/vitessio/vitess/blob/main/go/vt/sqlparser/sql.y. using grep and it turns out that 340/482 of those functions were not present in the yacc parser config. I may be wrong because I am not an expert in bash scripts but I searched some and they didn't exist

``` >> >= <> << <= <=> -> ->> ABS ACOS ADDDATE ADDTIME AES_DECRYPT AES_ENCRYPT && ANY_VALUE ASCII ASIN ATAN ATAN2 ATAN BENCHMARK BIN_TO_UUID BIT_AND BIT_COUNT BIT_LENGTH BIT_OR BIT_XOR CAN_ACCESS_COLUMN CAN_ACCESS_DATABASE CAN_ACCESS_TABLE CAN_ACCESS_USER CAN_ACCESS_VIEW CEIL CEILING CHAR_LENGTH CHARACTER_LENGTH COERCIBILITY CONCAT_WS CONNECTION_ID CONVERT_TZ COT COUNT COUNTDISTINCT CRC32 CURDATE CURRENT_ROLE CURTIME DATE_ADD DATE_FORMAT DATE_SUB DATEDIFF DAY DAYNAME DAYOFMONTH DAYOFWEEK DAYOFYEAR DEGREES ELT EXPORT_SET ExtractValue FIND_IN_SET FLOOR FORMAT_BYTES FORMAT_PICO_TIME FROM_BASE64 FROM_DAYS FROM_UNIXTIME GeomCollection GeometryCollection GET_DD_COLUMN_PRIVILEGES GET_DD_CREATE_OPTIONS GET_DD_INDEX_SUB_PART_LENGTH GET_FORMAT GET_LOCK GREATEST GTID_SUBSET GTID_SUBTRACT HOUR ICU_VERSION IFNULL INET_ATON INET_NTOA INET6_ATON INET6_NTOA INSTR INTERNAL_AUTO_INCREMENT INTERNAL_AVG_ROW_LENGTH INTERNAL_CHECK_TIME INTERNAL_CHECKSUM INTERNAL_DATA_FREE INTERNAL_DATA_LENGTH INTERNAL_DD_CHAR_LENGTH INTERNAL_GET_COMMENT_OR_ERROR INTERNAL_GET_ENABLED_ROLE_JSON INTERNAL_GET_HOSTNAME INTERNAL_GET_USERNAME INTERNAL_GET_VIEW_WARNING_OR_ERROR INTERNAL_INDEX_COLUMN_CARDINALITY INTERNAL_INDEX_LENGTH INTERNAL_IS_ENABLED_ROLE INTERNAL_IS_MANDATORY_ROLE INTERNAL_KEYS_DISABLED INTERNAL_MAX_DATA_LENGTH INTERNAL_TABLE_ROWS INTERNAL_UPDATE_TIME IS_FREE_LOCK IS_IPV4 IS_IPV4_COMPAT IS_IPV4_MAPPED IS_IPV6 IS_USED_LOCK IS_UUID ISNULL JSON_ARRAY JSON_ARRAY_APPEND JSON_ARRAY_INSERT JSON_ARRAYAGG JSON_CONTAINS JSON_CONTAINS_PATH JSON_DEPTH JSON_INSERT JSON_KEYS JSON_LENGTH JSON_MERGE JSON_MERGE_PATCH JSON_MERGE_PRESERVE JSON_OBJECT JSON_OBJECTAGG JSON_OVERLAPS JSON_PRETTY JSON_QUOTE JSON_REMOVE JSON_REPLACE JSON_SCHEMA_VALID JSON_SCHEMA_VALIDATION_REPORT JSON_SEARCH JSON_SET JSON_STORAGE_FREE JSON_STORAGE_SIZE JSON_TYPE JSON_VALID JSON_VALUE LAST_DAY LCASE LEAST LineString LOAD_FILE LOCATE LOG10 LOG2 LPAD LTRIM MAKE_SET MAKEDATE MAKETIME MASTER_POS_WAIT MBRContains MBRCoveredBy MBRCovers MBRDisjoint MBREquals MBRIntersects MBROverlaps MBRTouches MBRWithin MD5 MICROSECOND MID MINUTE MONTH MONTHNAME MultiLineString MultiPoint MultiPolygon NAME_CONST NULLIF OCT OCTET_LENGTH || PERIOD_ADD PERIOD_DIFF Point Polygon POSITION POW POWER PS_CURRENT_THREAD_ID PS_THREAD_ID QUARTER RADIANS RANDOM_BYTES REGEXP_INSTR REGEXP_LIKE REGEXP_REPLACE REGEXP_SUBSTR RELEASE_ALL_LOCKS RELEASE_LOCK REVERSE RLIKE ROLES_GRAPHML ROUND ROW_COUNT RPAD RTRIM SEC_TO_TIME SESSION_USER SHA1 SHA2 SLEEP SOUNDEX SOUNDS SOURCE_POS_WAIT SQRT ST_Area ST_AsBinary ST_AsWKB ST_AsGeoJSON ST_AsText ST_AsWKT ST_Buffer ST_Buffer_Strategy ST_Centroid ST_Collect ST_Contains ST_ConvexHull ST_Crosses ST_Difference ST_Dimension ST_Disjoint ST_Distance ST_Distance_Sphere ST_EndPoint ST_Envelope ST_Equals ST_ExteriorRing ST_FrechetDistance ST_GeoHash ST_GeomCollFromText ST_GeometryCollectionFromText ST_GeomCollFromTxt ST_GeomCollFromWKB ST_GeometryCollectionFromWKB ST_GeometryN ST_GeometryType ST_GeomFromGeoJSON ST_GeomFromText ST_GeometryFromText ST_GeomFromWKB ST_GeometryFromWKB ST_HausdorffDistance ST_InteriorRingN ST_Intersection ST_Intersects ST_IsClosed ST_IsEmpty ST_IsSimple ST_IsValid ST_LatFromGeoHash ST_Latitude ST_Length ST_LineFromText ST_LineStringFromText ST_LineFromWKB ST_LineStringFromWKB ST_LineInterpolatePoint ST_LineInterpolatePoints ST_LongFromGeoHash ST_Longitude ST_MakeEnvelope ST_MLineFromText ST_MultiLineStringFromText ST_MLineFromWKB ST_MultiLineStringFromWKB ST_MPointFromText ST_MultiPointFromText ST_MPointFromWKB ST_MultiPointFromWKB ST_MPolyFromText ST_MultiPolygonFromText ST_MPolyFromWKB ST_MultiPolygonFromWKB ST_NumGeometries ST_NumInteriorRing ST_NumInteriorRings ST_NumPoints ST_Overlaps ST_PointAtDistance ST_PointFromGeoHash ST_PointFromText ST_PointFromWKB ST_PointN ST_PolyFromText ST_PolygonFromText ST_PolyFromWKB ST_PolygonFromWKB ST_Simplify ST_SRID ST_StartPoint ST_SwapXY ST_SymDifference ST_Touches ST_Transform ST_Union ST_Validate ST_Within ST_X ST_Y STATEMENT_DIGEST STATEMENT_DIGEST_TEXT STDDEV STDDEV_POP STDDEV_SAMP STR_TO_DATE STRCMP SUBDATE SUBSTRING_INDEX SUBTIME SYSDATE SYSTEM_USER TAN TIME_FORMAT TIME_TO_SEC TIMEDIFF TO_BASE64 TO_DAYS TO_SECONDS TRIM UCASE UNCOMPRESS UNCOMPRESSED_LENGTH UNHEX UNIX_TIMESTAMP UpdateXML UPPER UUID_SHORT UUID_TO_BIN VALIDATE_PASSWORD_STRENGTH VAR_POP VAR_SAMP VARIANCE VERSION WAIT_FOR_EXECUTED_GTID_SET WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS WAIT_FOR_EXECUTED_GTID_SET WEEK WEEKDAY WEEKOFYEAR WEIGHT_STRING YEARWEEK ```

EDIT: There might be even more because I saw a section in the yacc parser config which said MySQL reserved words that are unused by this grammar will map to this token. As these also aren't used by the grammar

GuptaManan100 commented 3 years ago

@ritwizsinha Yes that is a great first step in gap analysis!

ritwizsinha commented 3 years ago

@GuptaManan100 are you available on slack we can discuss there

GuptaManan100 commented 3 years ago

Yes I am available on VItess slack

aribalam commented 3 years ago

@GuptaManan100 Looking at the sql.y file, there are already support for parsing the COLLATE and CHARACTER SET constructs. The same is also present in the parse_test.go. Could you please confirm?

GuptaManan100 commented 3 years ago

@aribalam please take a look at this thread in Vitess slack https://vitess.slack.com/archives/C0PQY0PTK/p1628839955078000

Thirumalai-Shaktivel commented 3 years ago

I would recommend keeping track of all the Built-In Functions and Operators in MySQL 5.7 which has already been implemented.

Marked one is already parsed by the sqlparser, Rest has to be implemented

GuptaManan100 commented 3 years ago

That is good work documenting the functions that we do not handle explicitly in MySQL @Thirumalai-Shaktivel.

tharun208 commented 2 years ago

@GuptaManan100 I like to work on this as of the LFX program. I am interested in databases and I would be familiarising myself with Vitess and getting to know better about Vitess. Any help is appreciated.

K-Kumar-01 commented 2 years ago

@GuptaManan100 I would like to work apply for this project under LFX. I am not familiar with Vitess as of now so currently I am going through the codebase and familiarizing myself with the concepts. I do have some work flow which I have thought of to follow and I would like to discuss it with you, if possible.

GuptaManan100 commented 2 years ago

All discussions about the project will happen on Slack. Here is the link to the general channel - https://vitess.slack.com/archives/C0PQY0PTK

GuptaManan100 commented 2 years ago

Added #9682 to the description for tracking progress

Weijun-H commented 2 years ago

Hi @GuptaManan100 , I am really interested in this project under LFX. Could you give me some suggestions how to start it? Should I figure out which functions is still missing?

GuptaManan100 commented 2 years ago

@Weijun-H, the issue has the list of things that the parser is missing. Anything that is unticked, can be worked on. You can look at any of the linked PRs to see where the tests reside and how to make parser changes. You can also refer to https://vitess.io/docs/15.0/contributing/contributing-to-ast-parser/ and https://vitess.io/docs/15.0/contributing/sample-first-issue/ for guidance.

monalisha31 commented 2 years ago

Hi @GuptaManan100, I would like apply for this project under the LFX mentorship program. This project seems interesting. Looking forward to contributing to it.

skant7 commented 2 years ago

Hi @GuptaManan100 ,I'm interested in working on this as part of the LFX Mentorship program. Can I start working on the String functions just to get a better understanding of the project or should I focus on the spatial functions to start with ?

GuptaManan100 commented 2 years ago

@skant7 Sure, go right ahead, pick up whichever one you feel most comfortable with.

SAEb-ai commented 2 years ago

Hi @GuptaManan100 !! I am interested to contribute to this project under the LFX mentorship program and hence I applied to it now. Looking forward to contribute. Thanks!!

GuptaManan100 commented 2 years ago

Everyone interested in the LFX project, please join the Vitess slack, #lfx-winter-2022 channel

ktwillcode commented 1 year ago

hi @GuptaManan100 Kartikeya this side, I applied for the mentorship and submitted a Cover letter and Resume. Is there anything more to complete the process?

GuptaManan100 commented 1 year ago

@ktwillcode There is nothing else required to complete the process. You can try your hand at implementing one of the functions that Vitess doesn’t already have parsing for. It will give you an idea on how the project work will be and it will give us confidence in your ability to do the project.

ktwillcode commented 1 year ago

@ktwillcode There is nothing else required to complete the process. You can try your hand at implementing one of the functions that Vitess doesn’t already have parsing for. It will give you an idea on how the project work will be and it will give us confidence in your ability to do the project.

@GuptaManan100 Okay. Looking forward to work on it

Ayman161803 commented 1 year ago

The following type of functions are yet to be implemented