Closed mariadb-RoelVandePaar closed 1 year ago
Hi Roel, sorry for the late reply. And thanks for being interested in using Squirrel in MariaDB. Squirrel didn't take parallelism into design since there were not many good SQL fuzzers at that time so we just wanted to create a useful and practical one. It is far from perfect, but it is effective :P.
You are right, there's nothing stopping Squirrel from being parallel. However, we currently don't have plans for supporting parallel fuzzing. Both of the maintainers (me and @zr950624) are closed to graduation year and we need to focus on that. However, afl or aflpp style parallel fuzzing is not special but mostly about syncing their corpus among the fuzzers. So instead of modifying the architecture, maybe we can still spawn a bunch of servers, but create a shared folders for the fuzzers' output, and use afl-like corpus synchronization so that the fuzzing progress can be shared.
Hope this helps!
Hi @Changochen!
" it is effective " - Indeed it is :P - Thank you for making a great tool!
The answer clarifies/helps, thank you. Hope your graduations will be Blessed and go super well.
While I have you, I have two other questions and one FYI.
Question1: In the output window, we see a status like this: bit flips: infty/10.3k, 0/0, 0/0
And it is the only counter which increases.
I assume that this counter (while incorrectly named 'bit flips' in the GUI output) actually represents (and internally uses) your modified (IR-based) algo - correct?
Question 2: How hard would it be to modify s3team/Polyglot for SQL? Could you give me a few starting pointers?
Question 3: How to best vary the language files further and modify/update the files?
In https://github.com/s3team/Squirrel/tree/master/MariaDB/docker/init_lib we find strip_mysql and strip_trigger_collect.
One directory higher, we also find global_data_lib_mysql and safe_generate_type_mysql.
How are these files generated, and how could one generate additional ones? What special function do these serve?
FYI: Full scale automation framework around squirrel as part of mariadb-qa. Ref. https://github.com/mariadb-corporation/mariadb-qa/tree/master/squirrel (modified clone) https://github.com/mariadb-corporation/mariadb-qa/tree/master/fuzzer https://github.com/mariadb-corporation/mariadb-qa/blob/master/fuzzer/PROCEDURE https://github.com/mariadb-corporation/mariadb-qa/blob/master/fuzzer/SETUP (WIP, will be finished in next 1-2 weeks)
You may also be interested in the pquery/mariadb-qa framework (the latest iteration of which is part of mariadb-qa), which is an other excellent tool for testing as well as building/automating instances etc. - ref https://www.percona.com/blog/2015/03/17/free-mysql-qa-and-bash-linux-training-series/ (the later episodes about the framework, which is now in it's second iteration with MariaDB as linked here). And it's results (now includes some Squirrel found bugs!)
Feel free to let anyone interested reach out.
Thank you very much and have a great weekend!
Hi @mariadb-RoelVandePaar, Thanks for being interested in fuzzing with Squirrel. For your questions,
parser
to translate the SQL to AST, then use the method translate
to make it to IR. After you have the IR, strip the identifier with extract_struct
and translate it back to SQL with method to_string
.Also thanks for your information! We are glad to see Squirrel is practical to use in industry projects. Yongheng and I still work around fuzzing projects, so we are happy to look at the resource of the MariaDB-QA framework since it may inspire us for further projects.
Hope my answers help!
Thank you very much @zr950624, and thank you for your input, it definitely helps.
"We are glad to see Squirrel is practical to use in industry projects" - yes, it most definitely is being used well.
We have found a substantial amount of bugs with the tool you developed, and we are only just starting. I also see in the white paper that this was an intention of the project (sharing with the industry) - which is great - thank you to you, @Changochen and the rest of the team!
We use reducer.sh to reduce the testcases, and other pquery/mariadb-qa framework tools to generate full reports for all versions etc. An example of a bug here: https://jira.mariadb.org/browse/MDEV-29017
As for question 3, let me see if I understand this correctly.
So I would take some initial SQL I have, then use the parser
function to translate this input SQL to AST. Where and how would the information be stored please?
Then use the translate
(this is all in C++ I assume), to translate the AST to IR - again where is the info stored?
And finally use extract_struct
to strip the 'identifier' (what is 'identifier' here please?), and translate it back to SQL with to_string
. Could you explain why the SQL-AST-IR-SQL loop is necessary? Or is this the mutation in essence?
We have plenty of input SQL which could be used for mutation. In such a case, are the above steps still necessary, or could we simply insert the SQL as-is into the framework? And if so, how?
I have tried changing the grammar by setting up a completely new directory with new txt SQL files and pointing to the same with
-i
, but Squirrel still seems to be using some of the other files mentioned above, somehow (perhaps it is using global_data_lib_mysql, safe_generate_type_mysql, strip_mysql or strip_trigger_collect or the init_lib
directory?)
My apologies, I am trying to better understand how to introduce a new grammar correctly into Squirrel.
For the last question, allow me show an example:
~/fuzzing/fuzzing_root/minimal$ ls
1.txt 10.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt
~/fuzzing/fuzzing_root/minimal$ cat *.txt
CREATE TABLE t AS SELECT 1;
INSERT INTO t VALUES (a);
DROP TABLE IF EXISTS t;
CREATE TABLE t (c INT) ENGINE=MyISAM;
CREATE TABLE t (c INT) ENGINE=InnoDB;
CREATE TABLE t (c INT) ENGINE=Aria;
CREATE TABLE t (c INT) ENGINE=MEMORY;
INSERT INTO t VALUES (1);
INSERT INTO t VALUES (0);
INSERT INTO t VALUES (NULL);
I created this example SQL and put it into ./minimal
in fuzzing_root
. Then I attempted to let Squirrel use this SQL by using
-i ./minimal
. However, on checking the fuzzer queue I see for example:
CREATE TABLE v0 AS SELECT DISTINCT 99 , -128 , TRUE , FALSE GROUP BY 'x' , 'x' , NULL , 'x' , 'x' HAVING NULL = 1423377.000000 * 94585408.000000 ;
CREATE TABLE IF NOT EXISTS v0 AS SELECT 5767244.000000 GROUP BY TRUE HAVING 30798334.000000 ;
CREATE TABLE v0 AS SELECT DISTINCT TRUE , ( SELECT CASE WHEN 'x' THEN 49 END ) , 'x' ;
Which are queries clearly far away from the SQL given as input.
This SQL seems to come from init_lib
- maybe init_lib
is harcoded somewhere or similar in Squirrel?
How to change the grammar so only the ./minimal
SQL is used (as an example)?
Thanks @zr950624 for the answer. Let me add a few extra information.
select X(Y)
, X
can be filled with a function name.It is a bit embarrassing but for safe_generate_type_mysql, we need to further look at the code to remember what it is for. But everything under init_lib
will be used for mutation. So the rule is, you cover as much grammar structure as possible in init_lib
(used as the source for mutation), and use the likely-bug triggering ones as -i
(the initial corpus).
For the last question, allow me show an example:
~/fuzzing/fuzzing_root/minimal$ ls 1.txt 10.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt ~/fuzzing/fuzzing_root/minimal$ cat *.txt CREATE TABLE t AS SELECT 1; INSERT INTO t VALUES (a); DROP TABLE IF EXISTS t; CREATE TABLE t (c INT) ENGINE=MyISAM; CREATE TABLE t (c INT) ENGINE=InnoDB; CREATE TABLE t (c INT) ENGINE=Aria; CREATE TABLE t (c INT) ENGINE=MEMORY; INSERT INTO t VALUES (1); INSERT INTO t VALUES (0); INSERT INTO t VALUES (NULL);
I created this example SQL and put it into
./minimal
infuzzing_root
. Then I attempted to let Squirrel use this SQL by using-i ./minimal
. However, on checking the fuzzer queue I see for example:CREATE TABLE v0 AS SELECT DISTINCT 99 , -128 , TRUE , FALSE GROUP BY 'x' , 'x' , NULL , 'x' , 'x' HAVING NULL = 1423377.000000 * 94585408.000000 ; CREATE TABLE IF NOT EXISTS v0 AS SELECT 5767244.000000 GROUP BY TRUE HAVING 30798334.000000 ; CREATE TABLE v0 AS SELECT DISTINCT TRUE , ( SELECT CASE WHEN 'x' THEN 49 END ) , 'x' ;
Which are queries clearly far away from the SQL given as input.
This SQL seems to come from
init_lib
- maybeinit_lib
is harcoded somewhere or similar in Squirrel?How to change the grammar so only the
./minimal
SQL is used (as an example)?
Therefore, for this example, it is normal that the query from init_lib
is used. If you want to only use the test cases in minimal
, use it in both -i
and as the init_lib
.
FYI: Full scale automation framework around squirrel as part of mariadb-qa. Ref. https://github.com/mariadb-corporation/mariadb-qa/tree/master/squirrel (modified clone) https://github.com/mariadb-corporation/mariadb-qa/tree/master/fuzzer https://github.com/mariadb-corporation/mariadb-qa/blob/master/fuzzer/PROCEDURE https://github.com/mariadb-corporation/mariadb-qa/blob/master/fuzzer/SETUP (WIP, will be finished in next 1-2 weeks)
You may also be interested in the pquery/mariadb-qa framework (the latest iteration of which is part of mariadb-qa), which is an other excellent tool for testing as well as building/automating instances etc. - ref https://www.percona.com/blog/2015/03/17/free-mysql-qa-and-bash-linux-training-series/ (the later episodes about the framework, which is now in it's second iteration with MariaDB as linked here). And it's results (now includes some Squirrel found bugs!)
That's AWESOME!!!! we will definitely check it out!!!
Ok, safe_generate_type_mysql is not used at all (for some feature we finally abandon). So please ignore it LOL
Thank you so much @Changochen for the additional input. It makes a lot more sense now. One final question, you said "Question 3: @zr950624 explained how to generate the files" - which files would that be? The in-program mutation files (which I thus would not need to build to change the grammar if I understood correctly, or something else)?
If you don't mind could we leave this ticket open for a bit to see if additional questions arise? Thank you.
Thank you so much @Changochen for the additional input. It makes a lot more sense now. One final question, you said "Question 3: @zr950624 explained how to generate the files" - which files would that be? The in-program mutation files (which I thus would not need to build to change the grammar if I understood correctly, or something else)?
If you don't mind could we leave this ticket open for a bit to see if additional questions arise? Thank you.
What I explained is how I generated the init_lib
. Specifically, how to convert select id ,name from student
to select x, x from x
, so we can put it into init_lib
. If you modify the grammar, it is better to put the newly added grammar structure intoinit_lib
.
Aha, understood. And now the syntax inside init_lib
makes more sense too! Thank you very much.
@zr950624 is there perhaps an example script which takes all the steps parser
>translate
>extract_struct
>to_string
?
Aha, understood. And now the syntax inside
init_lib
makes more sense too! Thank you very much.@zr950624 is there perhaps an example script which takes all the steps
parser>translate>extract_struct>to_string
?
We do not provide an example script at the present time. I give out the following code snippet to show the workflow. Note, I did not compile it, you may need to do some modifications in order to run it.
//// extract.cpp
/// ./extract filename
int main(int argc, char * argv[]){
string Sql;
ifstream Infile(argv[1]);
unordered_set<string> SqlStructs;
Mutator * Mutr = new Mutator();
while(getline(Infile, Sql)){
if(Sql.empty()) continue;
auto p = parser(Sql);
if(p == NULL) continue;
vector<IR *> IrVec;
IR * Root = NULL;
try{
Root = p->translate(IrVec);
p->deep_delete();
}
catch(...){
continue;
}
string SqlStruct = Mutr->extract_struct(Root);
if(SqlStructs.count(SqlStruct)){
deep_delete(Root);
continue;
}
SqlStructs.insert(SqlStruct);
deep_delete(Root);
}
for(string &OutSql: SqlStructs)
cout << OutSql << endl;
return 0;
}
@zr950624 that is above and beyond. Thank you very much.
@zr950624 Hi again! Hope you are both doing well. I took the code, added headers:
#include <fstream>
#include <string>
#include <cstring>
#include <bits/stdc++.h>
#include "include/mutate.h"
And tried to compile, but get the following errors:
extract.cpp: In function ‘int main(int, char**)’:
extract.cpp:37:48: error: conversion from ‘void’ to non-scalar type ‘std::string’ {aka ‘std::__cxx11::basic_string<char>’} requested
| string SqlStruct = Mutr->extract_struct(Root);
| ~~~~~~~~~~~~~~~~~~~~^~~~~~
extract.cpp:47:25: error: binding reference of type ‘std::string&’ {aka ‘std::__cxx11::basic_string<char>&’} to ‘const std::__cxx11::basic_string<char>’ discards qualifiers
| for(string &OutSql: SqlStructs)
|
Do you know how to fix this? Thank you!
@zr950624 Hi again! Hope you are both doing well. I took the code, added headers:
#include <fstream> #include <string> #include <cstring> #include <bits/stdc++.h> #include "include/mutate.h"
And tried to compile, but get the following errors:
extract.cpp: In function ‘int main(int, char**)’: extract.cpp:37:48: error: conversion from ‘void’ to non-scalar type ‘std::string’ {aka ‘std::__cxx11::basic_string<char>’} requested | string SqlStruct = Mutr->extract_struct(Root); | ~~~~~~~~~~~~~~~~~~~~^~~~~~ extract.cpp:47:25: error: binding reference of type ‘std::string&’ {aka ‘std::__cxx11::basic_string<char>&’} to ‘const std::__cxx11::basic_string<char>’ discards qualifiers | for(string &OutSql: SqlStructs) |
Do you know how to fix this? Thank you!
We do have 2 versions of extract_struct
due to the previous messy implementation.
I think what you want is extract_struct in srcs/internal/sqlite/include/mutator.h
.
@Changochen Hi! I updated the include to mutator.h
from sqlite
. However, the outcome is the same:
extract.cpp: In function ‘int main(int, char**)’:
extract.cpp:37:48: error: conversion from ‘void’ to non-scalar type ‘std::string’ {aka ‘std::__cxx11::basic_string<char>’} requested
37 | string SqlStruct = Mutr->extract_struct(Root);
| ~~~~~~~~~~~~~~~~~~~~^~~~~~
extract.cpp:47:25: error: binding reference of type ‘std::string&’ {aka ‘std::__cxx11::basic_string<char>&’} to ‘const std::__cxx11::basic_string<char>’ discards qualifiers
47 | for(string &OutSql: SqlStructs)
| ^~~~~~~~~~
The compile command is:
g++ -o extract extract.cpp -fpermissive
What am I missing here?
@Changochen Hi :)
@Changochen Hi! I updated the include to
mutator.h
fromsqlite
. However, the outcome is the same:extract.cpp: In function ‘int main(int, char**)’: extract.cpp:37:48: error: conversion from ‘void’ to non-scalar type ‘std::string’ {aka ‘std::__cxx11::basic_string<char>’} requested 37 | string SqlStruct = Mutr->extract_struct(Root); | ~~~~~~~~~~~~~~~~~~~~^~~~~~ extract.cpp:47:25: error: binding reference of type ‘std::string&’ {aka ‘std::__cxx11::basic_string<char>&’} to ‘const std::__cxx11::basic_string<char>’ discards qualifiers 47 | for(string &OutSql: SqlStructs) | ^~~~~~~~~~
The compile command is:
g++ -o extract extract.cpp -fpermissive
What am I missing here?
Hi, sorry for the late reply. We are busy with our course-related work since we reach the end of the semester. Could you please share your full code (with the compile directory path as well) instead of only the error message?
Hi @Changochen ! Sure, no worries and thank you. Full code:
#include <fstream>
#include <string>
#include <cstring>
#include <bits/stdc++.h>
#include "include/mutator.h"
using namespace std;
int main(int argc, char * argv[]){
string Sql;
ifstream Infile(argv[1]);
unordered_set<string> SqlStructs;
Mutator * Mutr = new Mutator();
while(getline(Infile, Sql)){
if(Sql.empty()) continue;
auto p = parser(Sql);
if(p == NULL) continue;
vector<IR *> IrVec;
IR * Root = NULL;
try{
Root = p->translate(IrVec);
p->deep_delete();
}
catch(...){
continue;
}
string SqlStruct = Mutr->extract_struct(Root);
if(SqlStructs.count(SqlStruct)){
deep_delete(Root);
continue;
}
SqlStructs.insert(SqlStruct);
deep_delete(Root);
}
for(string &OutSql: SqlStructs)
cout << OutSql << endl;
return 0;
}
And compile command (version 9.4.0):
g++ -o extract extract.cpp -fpermissive
And I manually copied in the required files to ./include
and ./parser
(i.e. it becomes 'standalone'):
$ ls include/
ast.h define.h mutator.h utils.h
$ ls parser/
bison_parser.h flex_lexer.h parser_typedef.h
Thank you!
Can you check mutator.h
for the return value of extract_struct
method.
Hi @mariadb-RoelVandePaar, sorry for the late reply since I don't check closed issues.
For your question, we have two versions of extract_struct
as mentioned above.
The first one is https://github.com/s3team/Squirrel/blob/b9fc9b7785ac5f3368d5c172ce6f9d1c113c0499/srcs/internal/sqlite/include/mutator.h#L43
The second one is https://github.com/s3team/Squirrel/blob/b9fc9b7785ac5f3368d5c172ce6f9d1c113c0499/srcs/internal/mysql/include/mutator.h#L65
I think you are using the wrong one according to the errors you get.
Would you mind double-checking?
If you want to use the return void
version, then you can call Root->to_string()
afterward to get an std::string
.
I have a working fuzzing instance on MariaDB. When I try and parallel fuzz, it does not work as each slave gets it's own
SHM_ENV_VAR
whereas the started database server offcourse only has one attached__AFL_SHM_ID
. How can one parallel fuzz? is there a way to use the single instrumented server with multiple AFL instancs executing queries via shared memory? Note that for the database server this should be perfectly fine, i.e. Squirrel can be modifiedCREATE DATABASE a1000;
,CREATE DATABASE b1000;
etc - i.e. the server can easily execute multiple queries at the same time against different databases with the same number but differentiated another way, in this example witha
,b
etc. for different parallelafl-fuzz
fuzzer instances. AFLplusplus supports parallel fuzzing, but it seems Squirrel does not?