解决xunsearch/util/Indexer.php --rebuild 导入数据超过200万速度变慢问题

@hightman 经过测试600万数据，越是执行到后面，查询速度越慢，我执行了5天才构造完成，通过后台查询得知， mysql查询数据，当OFFSET 很大的时候，查询数据会非常慢，每次查询会消耗大量的IO，数据库会根据索引，依次排除前面的2000000条数据，最后得到1000条。例如：select id,title,content from article limit 1000 offset 2000000 返回数据几十秒所以，导入速度越来越慢的根本原因是查询数据。如果把上面的sql改成：select id,title,content from article where id>100000 order by id asc limit 1000 offset 0，执行速度每次都很快，经过测试，600万数据，15分钟就导入完成。修改方法： https://www.cjblog.org/blog/1521431254596 修改后的文件： https://raw.githubusercontent.com/wcj343169893/xs-sdk-php/master/util/XSDataSource.class.php 希望能有点用处

那个只是提供一个参考，并不是所有数据库表都包含自增主键，所以你这个语句也并不适用

Best Regards

hightman/海鳗

微信/微博：hightman Github：https://github.com/hightman

在 2018年3月19日，下午4:31，wcj343169893 notifications@github.com 写道：

@hightman https://github.com/hightman 经过测试600万数据，越是执行到后面，查询速度越慢，我执行了5天才构造完成，通过后台查询得知， mysql查询数据，当OFFSET 很大的时候，查询数据会非常慢，每次查询会消耗大量的IO，数据库会根据索引，依次排除前面的2000000条数据，最后得到1000条。例如：select id,title,content from article limit 1000 offset 2000000 返回数据几十秒所以，导入速度越来越慢的根本原因是查询数据。如果把上面的sql改成：select id,title,content from article where id>100000 order by id asc limit 1000 offset 0，执行速度每次都很快，经过测试，600万数据，15分钟就导入完成。修改方法： https://www.cjblog.org/blog/1521431254596 https://www.cjblog.org/blog/1521431254596 修改后的文件： https://raw.githubusercontent.com/wcj343169893/xs-sdk-php/master/util/XSDataSource.class.php https://raw.githubusercontent.com/wcj343169893/xs-sdk-php/master/util/XSDataSource.class.php 希望能有点用处

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hightman/xunsearch/issues/53, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxlXYBaeOSA_hv59NA3TxeSaTYXabMXks5tf2zugaJpZM4Svwmd.

hightman / xunsearch

解决xunsearch/util/Indexer.php --rebuild 导入数据超过200万速度变慢问题 #53