Open PavelShilin89 opened 1 week ago
I've commited the updated script in https://github.com/manticoresoftware/manticoresearch/pull/2718
How it works now:
➜ manticore_github git:(master) ✗ php ./test/clt-tests/scripts/load_us_names_min_infix_len.php --help
Usage: /Users/sn/manticore_github/test/clt-tests/scripts/load_us_names_min_infix_len.php [options]
Options:
--batch-size=<number> Number of records per batch (default: 1000)
--concurrency=<number> Number of concurrent connections (default: 4)
--docs=<number> Total number of documents to insert (default: 1000000)
--min-infix-len=<number> Optional minimum infix length for table (default: none)
--start-id=<number> Starting ID for document insertion (default: 1)
--drop-table Drop and create the table before inserting data (default: true)
--no-drop-table Prevent the table from being dropped and created
--help Show this help message```
1M docs example showing the same data is loaded when the script is run the 2nd time:
➜ manticore_github git:(master) ✗ php ./test/clt-tests/scripts/load_us_names_min_infix_len.php
preparing...
found in cache
querying...
finished inserting
Total time: 4.6767749786377
213822 docs per sec
➜ manticore_github git:(master) ✗ mysqldump -P9306 -h0 -etc manticore name|grep INSERT|md5sum
-- Warning: version string returned by server is incorrect.
-- Warning: column statistics not supported by the server.
bd1aa58895d1759750e50fe55709949e -
➜ manticore_github git:(master) ✗ php ./test/clt-tests/scripts/load_us_names_min_infix_len.php
preparing...
found in cache
querying...
finished inserting
Total time: 8.7348871231079
114483 docs per sec
➜ manticore_github git:(master) ✗ mysqldump -P9306 -h0 -etc manticore name|grep INSERT|md5sum
-- Warning: version string returned by server is incorrect.
-- Warning: column statistics not supported by the server.
bd1aa58895d1759750e50fe55709949e -
Another example demonstrating inserting more data to an existing table:
➜ manticore_github git:(master) ✗ php ./test/clt-tests/scripts/load_us_names_min_infix_len.php --batch-size=100 --concurrency=1 --docs=1000 --min_infix_len=2 --start-id=1
Table 'name' dropped and recreated.
preparing...
100% querying...
finished inserting
Total time: 0.007519006729126
132874 docs per sec
➜ manticore_github git:(master) ✗ mysql -P9306 -h0 -e "flush ramchunk name"
➜ manticore_github git:(master) ✗ php ./test/clt-tests/scripts/load_us_names_min_infix_len.php --batch-size=100 --concurrency=1 --docs=1000 --min_infix_len=2 --start-id=1001 --no-drop-table
preparing...
100% querying...
finished inserting
Total time: 0.0079059600830078
126376 docs per sec
➜ manticore_github git:(master) ✗ mysql -P9306 -h0 -e "flush ramchunk name"
➜ manticore_github git:(master) ✗ mysql -P9306 -h0 -e "optimize table name option sync=1, cutoff=1"
➜ manticore_github git:(master) ✗ mysqldump -P9306 -h0 -etc manticore name|grep INSERT|md5sum
-- Warning: version string returned by server is incorrect.
-- Warning: column statistics not supported by the server.
df0a65236760c48cf1d54e83929a9bf2 -
➜ manticore_github git:(master) ✗ php ./test/clt-tests/scripts/load_us_names_min_infix_len.php --batch-size=100 --concurrency=1 --docs=1000 --min_infix_len=2 --start-id=1
Table 'name' dropped and recreated.
preparing...
100% querying...
finished inserting
Total time: 0.049274921417236
20287 docs per sec
➜ manticore_github git:(master) ✗ mysql -P9306 -h0 -e "flush ramchunk name"
➜ manticore_github git:(master) ✗ php ./test/clt-tests/scripts/load_us_names_min_infix_len.php --batch-size=100 --concurrency=1 --docs=1000 --min_infix_len=2 --start-id=1001 --no-drop-table
preparing...
100% querying...
finished inserting
Total time: 0.0061080455780029
163502 docs per sec
➜ manticore_github git:(master) ✗ mysql -P9306 -h0 -e "flush ramchunk name"
➜ manticore_github git:(master) ✗ mysql -P9306 -h0 -e "optimize table name option sync=1, cutoff=1"
➜ manticore_github git:(master) ✗ mysqldump -P9306 -h0 -etc manticore name|grep INSERT|md5sum
-- Warning: version string returned by server is incorrect.
-- Warning: column statistics not supported by the server.
df0a65236760c48cf1d54e83929a9bf2 -
Note, I've removed all related with:
--start-id
, just increase --docs
if needed.Please test the updated script and let me know if there's any issue with it.
Proposal:
I need help modifying the load_us_names_min_infix_len.php script, which is used in several tests. The script is located in the wizard at the path
./test/clt-tests/scripts/load_us_names_min_infix_len.php
. The features required to be added to the script are:--argument-name=
, e.g.--batch-size=100000 --concurrency=4 --docs=1000000
.Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.