Open GuptaManan100 opened 3 years ago
Hello @GuptaManan100 I have been trying out and understanding various projects in CNCF ecosystem and I am particularly interested in those related to databases and networking so I would like to get involved with this project and work on this issue. I would like to contribute to this project and increase my knowledge. I would be familiarising myself with terminology and get to know better about Vitess in the coming days. Any help appreciated.
@ritwizsinha good to hear! This project is part of the LFX program. More information at https://github.com/cncf/mentoring/tree/main/lfx-mentorship. We would love to see you apply in it. All the best!
Thanks @GuptaManan100 I would like to apply to it, what all do I need to provide to improve my applications, do I need to have previous knowledge of Vitess, or previous contributions in this repository to get selected? Do I have to write out an implementation proposal for this feature(I am asking because I heard that every org did it differently)? What all would you like to see in the application ?
@ritwizsinha No, we do not expect any previous knowledge of Vitess from the applicants. There is no explicit requirement for a implementation proposal either, but it would be great if you would be able to do a gap analysis of what all Vitess parser is missing from MySQL. I have written some of the constructs that I know that Vitess does not parse correctly
There are still a significant number of 8.0 functions without support: https://github.com/vitessio/vitess/issues/4099
Okay thanks @GuptaManan10 and @derekperkins I will look into it
Hi @GuptaManan100 , a student from IIT Kharagpur here. I would like apply for this project under the LFX mentorship program. I have read the previous comments and the requirements you mentioned. Looking forward to it. :)
@GuptaManan100 I have read bit of the documentation of Vitess and set it up locally using Docker, now to identify which constructs are missing, I am thinking of trying to execute the methods you mentioned above, for example for partitioning. then for collation etc. Is their any document which specifies what all the Vitess parser recognizes and parses or we have to check it all manually?
@ritwizsinha Have a look at https://github.com/vitessio/vitess/blob/main/go/vt/sqlparser/sql.y. This file is our yacc parser configuration. So this is the authoritative source on what we parse and what we don't. Within the same package we have a parse_test.go
file which has the parsing tests.
@GuptaManan100 I wrote a quick script to match all the functions in here https://dev.mysql.com/doc/refman/8.0/en/built-in-function-reference.html which aren't present in the https://github.com/vitessio/vitess/blob/main/go/vt/sqlparser/sql.y. using grep and it turns out that 340/482 of those functions were not present in the yacc parser config. I may be wrong because I am not an expert in bash scripts but I searched some and they didn't exist
EDIT: There might be even more because I saw a section in the yacc parser config which said
MySQL reserved words that are unused by this grammar will map to this token.
As these also aren't used by the grammar
@ritwizsinha Yes that is a great first step in gap analysis!
@GuptaManan100 are you available on slack we can discuss there
Yes I am available on VItess slack
@GuptaManan100 Looking at the sql.y
file, there are already support for parsing the COLLATE and CHARACTER SET constructs. The same is also present in the parse_test.go
. Could you please confirm?
@aribalam please take a look at this thread in Vitess slack https://vitess.slack.com/archives/C0PQY0PTK/p1628839955078000
I would recommend keeping track of all the Built-In Functions and Operators in MySQL 5.7 which has already been implemented.
Marked one is already parsed by the
sqlparser
, Rest has to be implemented
&
>
>>
>=
<
<>, !=
<<
<=
<=>
%, MOD
*
+
-
-
->
->>
/
:=
=
=
^
ABS()
ACOS()
ADDDATE()
ADDTIME()
AES_DECRYPT()
AES_ENCRYPT()
AND, &&
ANY_VALUE()
Area()
AsBinary(), AsWKB()
ASCII()
ASIN()
AsText(), AsWKT()
ATAN()
ATAN2(), ATAN()
AVG()
BENCHMARK()
BETWEEN ... AND ...
BIN()
BINARY
BIT_AND()
BIT_COUNT()
BIT_LENGTH()
BIT_OR()
BIT_XOR()
Buffer()
CASE
CAST()
CEIL()
CEILING()
Centroid()
CHAR()
CHAR_LENGTH()
CHARACTER_LENGTH()
CHARSET()
COALESCE()
COERCIBILITY()
COLLATION()
COMPRESS()
CONCAT()
CONCAT_WS()
CONNECTION_ID()
Contains()
CONV()
CONVERT()
CONVERT_TZ()
ConvexHull()
COS()
COT()
COUNT()
COUNT(DISTINCT)
CRC32()
Crosses()
CURDATE()
CURRENT_DATE(), CURRENT_DATE
CURRENT_TIME(), CURRENT_TIME
CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP
CURRENT_USER(), CURRENT_USER
CURTIME()
DATABASE()
DATE()
DATE_ADD()
DATE_FORMAT()
DATE_SUB()
DATEDIFF()
DAY()
DAYNAME()
DAYOFMONTH()
DAYOFWEEK()
DAYOFYEAR()
DECODE()
DEFAULT()
DEGREES()
DES_DECRYPT()
DES_ENCRYPT()
Dimension()
Disjoint()
Distance()
DIV
ELT()
ENCODE()
ENCRYPT()
EndPoint()
Envelope()
Equals()
EXP()
EXPORT_SET()
ExteriorRing()
EXTRACT()
ExtractValue()
FIELD()
FIND_IN_SET()
FLOOR()
FORMAT()
FOUND_ROWS()
FROM_BASE64()
FROM_DAYS()
FROM_UNIXTIME()
GeomCollFromText(), GeometryCollectionFromText()
GeomCollFromWKB(), GeometryCollectionFromWKB()
GeometryCollection()
GeometryN()
GeometryType()
GeomFromText(), GeometryFromText()
GeomFromWKB(), GeometryFromWKB()
GET_FORMAT()
GET_LOCK()
GLength()
GREATEST()
GROUP_CONCAT()
GTID_SUBSET()
GTID_SUBTRACT()
HEX()
HOUR()
IF()
IFNULL()
IN()
INET_ATON()
INET_NTOA()
INET6_ATON()
INET6_NTOA()
INSERT()
INSTR()
InteriorRingN()
Intersects()
INTERVAL()
IS
IS_FREE_LOCK()
IS_IPV4()
IS_IPV4_COMPAT()
IS_IPV4_MAPPED()
IS_IPV6()
IS NOT
IS NOT NULL
IS NULL
IS_USED_LOCK()
IsClosed()
IsEmpty()
ISNULL()
IsSimple()
JSON_APPEND()
JSON_ARRAY()
JSON_ARRAY_APPEND()
JSON_ARRAY_INSERT()
JSON_ARRAYAGG()
JSON_CONTAINS()
JSON_CONTAINS_PATH()
JSON_DEPTH()
JSON_EXTRACT()
JSON_INSERT()
JSON_KEYS()
JSON_LENGTH()
JSON_MERGE()
JSON_MERGE_PATCH()
JSON_MERGE_PRESERVE()
JSON_OBJECT()
JSON_OBJECTAGG()
JSON_PRETTY()
JSON_QUOTE()
JSON_REMOVE()
JSON_REPLACE()
JSON_SEARCH()
JSON_SET()
JSON_STORAGE_SIZE()
JSON_TYPE()
JSON_UNQUOTE()
JSON_VALID()
LAST_DAY
LAST_INSERT_ID()
LCASE()
LEAST()
LEFT()
LENGTH()
LIKE
LineFromText(), LineStringFromText()
LineFromWKB(), LineStringFromWKB()
LineString()
LN()
LOAD_FILE()
LOCALTIME(), LOCALTIME
LOCALTIMESTAMP, LOCALTIMESTAMP()
LOCATE()
LOG()
LOG10()
LOG2()
LOWER()
LPAD()
LTRIM()
MAKE_SET()
MAKEDATE()
MAKETIME()
MASTER_POS_WAIT()
MATCH
MAX()
MBRContains()
MBRCoveredBy()
MBRCovers()
MBRDisjoint()
MBREqual()
MBREquals()
MBRIntersects()
MBROverlaps()
MBRTouches()
MBRWithin()
MD5()
MICROSECOND()
MID()
MIN()
MINUTE()
MLineFromText(), MultiLineStringFromText()
MLineFromWKB(), MultiLineStringFromWKB()
MOD()
MONTH()
MONTHNAME()
MPointFromText(), MultiPointFromText()
MPointFromWKB(), MultiPointFromWKB()
MPolyFromText(), MultiPolygonFromText()
MPolyFromWKB(), MultiPolygonFromWKB()
MultiLineString()
MultiPoint()
MultiPolygon()
NAME_CONST()
NOT, !
NOT BETWEEN ... AND ...
NOT IN()
NOT LIKE
NOT REGEXP
NOW()
NULLIF()
NumGeometries()
NumInteriorRings()
NumPoints()
OCT()
OCTET_LENGTH()
OR, ||
ORD()
Overlaps()
PASSWORD()
PERIOD_ADD()
PERIOD_DIFF()
PI()
Point()
PointFromText()
PointFromWKB()
PointN()
PolyFromText(), PolygonFromText()
PolyFromWKB(), PolygonFromWKB()
Polygon()
POSITION()
POW()
POWER()
PROCEDURE ANALYSE()
QUARTER()
QUOTE()
RADIANS()
RAND()
RANDOM_BYTES()
REGEXP
RELEASE_ALL_LOCKS()
RELEASE_LOCK()
REPEAT()
REPLACE()
REVERSE()
RIGHT()
RLIKE
ROUND()
ROW_COUNT()
RPAD()
RTRIM()
SCHEMA()
SEC_TO_TIME()
SECOND()
SESSION_USER()
SHA1(), SHA()
SHA2()
SIGN()
SIN()
SLEEP()
SOUNDEX()
SOUNDS LIKE
SPACE()
SQRT()
SRID()
ST_Area()
ST_AsBinary(), ST_AsWKB()
ST_AsGeoJSON()
ST_AsText(), ST_AsWKT()
ST_Buffer()
ST_Buffer_Strategy()
ST_Centroid()
ST_Contains()
ST_ConvexHull()
ST_Crosses()
ST_Difference()
ST_Dimension()
ST_Disjoint()
ST_Distance()
ST_Distance_Sphere()
ST_EndPoint()
ST_Envelope()
ST_Equals()
ST_ExteriorRing()
ST_GeoHash()
ST_GeomCollFromText(), ST_GeometryCollectionFromText(), ST_GeomCollFromTxt()
ST_GeomCollFromWKB(), ST_GeometryCollectionFromWKB()
ST_GeometryN()
ST_GeometryType()
ST_GeomFromGeoJSON()
ST_GeomFromText(), ST_GeometryFromText()
ST_GeomFromWKB(), ST_GeometryFromWKB()
ST_InteriorRingN()
ST_Intersection()
ST_Intersects()
ST_IsClosed()
ST_IsEmpty()
ST_IsSimple()
ST_IsValid()
ST_LatFromGeoHash()
ST_Length()
ST_LineFromText(), ST_LineStringFromText()
ST_LineFromWKB(), ST_LineStringFromWKB()
ST_LongFromGeoHash()
ST_MakeEnvelope()
ST_MLineFromText(), ST_MultiLineStringFromText()
ST_MLineFromWKB(), ST_MultiLineStringFromWKB()
ST_MPointFromText(), ST_MultiPointFromText()
ST_MPointFromWKB(), ST_MultiPointFromWKB()
ST_MPolyFromText(), ST_MultiPolygonFromText()
ST_MPolyFromWKB(), ST_MultiPolygonFromWKB()
ST_NumGeometries()
ST_NumInteriorRing(), ST_NumInteriorRings()
ST_NumPoints()
ST_Overlaps()
ST_PointFromGeoHash()
ST_PointFromText()
ST_PointFromWKB()
ST_PointN()
ST_PolyFromText(), ST_PolygonFromText()
ST_PolyFromWKB(), ST_PolygonFromWKB()
ST_Simplify()
ST_SRID()
ST_StartPoint()
ST_SymDifference()
ST_Touches()
ST_Union()
ST_Validate()
ST_Within()
ST_X()
ST_Y()
StartPoint()
STD()
STDDEV()
STDDEV_POP()
STDDEV_SAMP()
STR_TO_DATE()
STRCMP()
SUBDATE()
SUBSTR()
SUBSTRING()
SUBSTRING_INDEX()
SUBTIME()
SUM()
SYSDATE()
SYSTEM_USER()
TAN()
TIME()
TIME_FORMAT()
TIME_TO_SEC()
TIMEDIFF()
TIMESTAMP()
TIMESTAMPADD()
TIMESTAMPDIFF()
TO_BASE64()
TO_DAYS()
TO_SECONDS()
Touches()
TRIM()
TRUNCATE()
UCASE()
UNCOMPRESS()
UNCOMPRESSED_LENGTH()
UNHEX()
UNIX_TIMESTAMP()
UpdateXML()
UPPER()
USER()
UTC_DATE()
UTC_TIME()
UTC_TIMESTAMP()
UUID()
UUID_SHORT()
VALIDATE_PASSWORD_STRENGTH()
VALUES()
VAR_POP()
VAR_SAMP()
VARIANCE()
VERSION()
WAIT_FOR_EXECUTED_GTID_SET()
WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS()
WEEK()
WEEKDAY()
WEEKOFYEAR()
WEIGHT_STRING()
Within()
X()
XOR
Y()
YEAR()
YEARWEEK()
|
~
That is good work documenting the functions that we do not handle explicitly in MySQL @Thirumalai-Shaktivel.
@GuptaManan100 I like to work on this as of the LFX program. I am interested in databases and I would be familiarising myself with Vitess and getting to know better about Vitess. Any help is appreciated.
@GuptaManan100 I would like to work apply for this project under LFX. I am not familiar with Vitess as of now so currently I am going through the codebase and familiarizing myself with the concepts. I do have some work flow which I have thought of to follow and I would like to discuss it with you, if possible.
All discussions about the project will happen on Slack. Here is the link to the general channel - https://vitess.slack.com/archives/C0PQY0PTK
Added #9682 to the description for tracking progress
Hi @GuptaManan100 , I am really interested in this project under LFX. Could you give me some suggestions how to start it? Should I figure out which functions is still missing?
@Weijun-H, the issue has the list of things that the parser is missing. Anything that is unticked, can be worked on. You can look at any of the linked PRs to see where the tests reside and how to make parser changes. You can also refer to https://vitess.io/docs/15.0/contributing/contributing-to-ast-parser/ and https://vitess.io/docs/15.0/contributing/sample-first-issue/ for guidance.
Hi @GuptaManan100, I would like apply for this project under the LFX mentorship program. This project seems interesting. Looking forward to contributing to it.
Hi @GuptaManan100 ,I'm interested in working on this as part of the LFX Mentorship program. Can I start working on the String functions just to get a better understanding of the project or should I focus on the spatial functions to start with ?
@skant7 Sure, go right ahead, pick up whichever one you feel most comfortable with.
Hi @GuptaManan100 !! I am interested to contribute to this project under the LFX mentorship program and hence I applied to it now. Looking forward to contribute. Thanks!!
Everyone interested in the LFX project, please join the Vitess slack, #lfx-winter-2022
channel
hi @GuptaManan100 Kartikeya this side, I applied for the mentorship and submitted a Cover letter and Resume. Is there anything more to complete the process?
@ktwillcode There is nothing else required to complete the process. You can try your hand at implementing one of the functions that Vitess doesn’t already have parsing for. It will give you an idea on how the project work will be and it will give us confidence in your ability to do the project.
@ktwillcode There is nothing else required to complete the process. You can try your hand at implementing one of the functions that Vitess doesn’t already have parsing for. It will give you an idea on how the project work will be and it will give us confidence in your ability to do the project.
@GuptaManan100 Okay. Looking forward to work on it
The following type of functions are yet to be implemented
Description
Vitess has its own in-built SQL-parser which it uses to understand the query and represent as structs for further processing. As of now, a lot of MySQL structs are not parsed and result in syntax errors. This issue is to track the progress on adding parsing for such constructs
Work Done:
8797
8821
8884
8918
9006
9011
9029
9075
8691