Closed BirdyUK closed 3 months ago
@BirdyUK you have a pretty good server :) In order to speed things up, the indexer does everything in memory. However, on a very large dataset, sooner or later, you'll run out of memory.
That's why there is an option to turn this off. Take a look here
https://github.com/teamtnt/tntsearch/blob/master/src/Indexer/TNTIndexer.php#L34
The flag is called inMemory
and it should be set to false
in your case. This should then prevent the system to run out of memory. Since in the current version of laravel scout, it's not possible to tweak this, my suggestion is to directly modify the source of TNTSearch package and see if this will help resolve your issue.
If you successfully resolve it, we'll add it to the configuration. The only thing I am currently not sure about is if we should add it to the global configuration or to the model itself as a property.
The asYouType
property makes more sense in the model. It would look something like
<?php
namespace App;
use Illuminate\Database\Eloquent\Model;
use Laravel\Scout\Searchable;
class Post extends Model
{
use Searchable;
public asYouType = true;
protected inMemory = true; //not sure if this should be set globally or not
}
Let us know what you think.
A 30 million records dataset is a really good showcase to see how well TNTSearch will perform.
Maybe we could help you with some pointers if you share your Address
table structure with us.
For example the command php artisan tntsearch:import App\\Addresses
will also import the timestamps which aren't necessary and slow down the indexing
Hi @nticaric - Just to inform you, I ran php -d memory_limit=-1 artisan ....
And touch wood i am now at 20+ million with no issues what so ever and fuzzy search is returning results in 40ms :) 👍 -- Very excited to see what the end result will be :)
The database table holds 35 Million i think so will let you know more when the import finishes :)
My plan is to offer a free address lookup / postcode lookup / address finder api for uk addresses as all the paid services charge at least 2 pence per lookup and the help of this package has hopefully made it easier so sponsorship will be certain :)
P.S: Once this import finishes i will do some further testing and import a second table with some advice from you and see what works out to be better, I will also modify the source and see what difference it makes and monitor system stats.
Test Table for adresses is: id postcode address city lat lng created_at updated_at
Production table that will be the next import i do with TNTSearch will hopefully be: id postcode address
no timestamps needed, All the data in the table is static and will rarely change, We only add new data every yearly quarter that is very minimal for new addresses and postcodes or removal of depreciated addresses and postcodes.
Glad that it's working out for you. Let us know the results. I'll try to add the as you type functionality as a runtime option
@nticaric - Do you have a way i can contact you for private communication, It would be great to get some thoughts on the project im working on, I will get some funds over to you via the monthly sponsorship on Monday to help with your overheads to enable you to work on TNTSearch :)
@nticaric - Unfortunately it crashed on 29 Million, After doing some investigation it seems that when importing the large database CPU instantly shoots to 100% the minute you start the import and sits at 100%.
I'm using a optimized nginx php7 setup and that's the first time i have ever seen the cpu at 100%, Memory usage on stats seemed to remain very low, That being said i did not fancy watching it import another 30 million records to see what the cpu / memory usage is like at point of crash, though i could write a script to dump the log, I also think it hits 100% the second you perform a query aswel and drops back to normal when the results are returned.
The second concern i now have is when i do a query on the current 29 million records that was imported, It can take a few seconds to return the results depending on where about's in the index the search term is stored.
For example postcodes are stored alphabetically so if i search AB10 the query is very fast and returns the result set within 20ms...
If i query ZE10AQ then it takes around 5 seconds, You can see the delay increase as you dig further for query's... An example would be:
AB10 (Returns in 20ms)
DU19 (Returns in 1 second)
S64 (Returns in 2 seconds)
ZE10 (Returns in 3+ seconds)
This is also the same for keywords such as (Main Street) or (London), Is there anything we could do to improve this? Would be great if i could get some support to help achieve the end result and it will be an excellent demo / showcase for TNTSearch to show what is possible :)
After all doing full text and fuzzy searching on 30+ Million addresses with fast enough response time to make address autocompletion possible is a great achievement :)
@BirdyUK drop us an email to info@tntstudio.hr and we'll help you to optimize this.
@nticaric - Will do mate i will send you an email later tonight and will post a comment here to let you know when i have sent it. Look forward to speaking soon mate, Cheers :)
Just released a new version that allows you to set the asYouType
option in runtime. What you should try is to turn off the fuzziness in the config and enable asYouType
<?php
namespace App;
use Illuminate\Database\Eloquent\Model;
use Laravel\Scout\Searchable;
class Address extends Model
{
use Searchable;
public asYouType = true;
}
When you make the changes what is the execution time?
Forgot to ask, what was the indexing time for those 20+ millions records?
@nticaric - Sorry for the delay on getting back to you, I have been really busy working on some projects. I will send you an email over today to info@tntstudio.hr and hopefully i can get your help on getting the search functionality into a production state and also help you out with some funds. to help support you with TNTSearch. I will drop you an email and hopefully we can talk via a chat platform. -- Look forward to speaking with you soon. Thanks.
I am facing the exact issue. I have about 1.1 Mil rows that I am trying to index with the tntsearch:import
:
php artisan tntsearch:import App\\Correspondence
After 300K records I got the following error:
Commited
Processed 291000 rows
Processed 292000 rows
Processed 293000 rows
Processed 294000 rows
Processed 295000 rows
Processed 296000 rows
Processed 297000 rows
Processed 298000 rows
Processed 299000 rows
Processed 300000 rows
Commited
Processed 301000 rows
Processed 302000 rows
Processed 303000 rows
Processed 304000 rows
Processed 305000 rows
Processed 306000 rows
Processed 307000 rows
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 75497472 bytes) in /var/www/html/project-css/vendor/teamtnt/tntsearch/src/Indexer/TNTIndexer.php on line 393
[Symfony\Component\Debug\Exception\FatalErrorException]
Allowed memory size of 134217728 bytes exhausted (tried to allocate 754974
72 bytes)
Following is my project development environment:
For my application I don't want to increase the memory in php.ini
like suggested here or run the artisan with memory option like @BirdyUK did in his case. Can I go ahead change this protected inMemory = true;
in the model yet or do I have to go and manipulate the source?
I get the same error with L5.4 and Scout with TNTSearch trying to import 3M rows.
I would really like to be able to set the $inMemery = false; to my model, so this way would be really modular.
For the moment I'm running php -d memory_limit=-1 artisan tntsearch:import App\\MyModel
on my dev env that's a MacBook Air 8GB RAM and It's completing with success.
Hi, I am having issues importing a large databse of 30+ Million records.
I am constantly getting this issue when i get to approx 1+ million..
The server is has 128gb of ram and dual xeon e5's so memory should not be an issue..
Is there anything you can suggest doing? The command i run is:
php artisan tntsearch:import App\\Addresses
Once i get this resolved and have the data / search running i am certainly going to become a sponsor to this project as it has/will save me a lot of headaches with the project i am trying to get this working on :)
UPDATE: Using the following command ...
php -d memory_limit=4048M artisan tntsearch:import App\\Addresses
To increase the memory limit on the fly helped get the import to 8 million but same issue occurred.Always points back to the TNTIndexer.php, is there a way we can make this more memory friendly?
Thanks