Open Ezka77 opened 8 years ago
Hi @Ezka77.
Try running nefertari.index -c docker.ini --models Label --recreate --params=_limit=NUM
where NUM
is number greater than a number of Label items in your db.
Hi @postatum
Humm --params=
seems a bit undocumented but now I remember I've used it last time this case happened. Ok I see the idea but the command end on this error:
nefertari.index: error: argument --recreate: not allowed with argument --models
If I remember correctly --recreate
replace the --force
option, but it seems doing a bit more.
So I've run the command whithout --recreate
should be ok, only ~3k rows are missing, but here the traceback:
2016-11-02 09:40:01,906 - elasticsearch - GET http://elasticsearch:9200/hathor/Label/_mget?fields=_id [status:200 request:0.588s]
Traceback (most recent call last):
File "/usr/local/bin/nefertari.index", line 9, in <module>
load_entry_point('nefertari==0.7.0', 'console_scripts', 'nefertari.index')()
File "/usr/local/lib/python3.5/site-packages/nefertari/scripts/es.py", line 23, in main
return command.run()
File "/usr/local/lib/python3.5/site-packages/nefertari/scripts/es.py", line 123, in run
self.index_models(model_names)
File "/usr/local/lib/python3.5/site-packages/nefertari/scripts/es.py", line 103, in index_models
es.index_missing_documents(documents)
File "/usr/local/lib/python3.5/site-packages/nefertari/elasticsearch.py", line 358, in index_missing_documents
self._bulk('index', documents, request)
File "/usr/local/lib/python3.5/site-packages/nefertari/elasticsearch.py", line 318, in _bulk
operation=operation)
File "/usr/local/lib/python3.5/site-packages/nefertari/elasticsearch.py", line 269, in process_chunks
if count < chunk_size:
TypeError: unorderable types: int() < str()
I guess a string convertion is missing somewhere. Found a real bug this time =).
Well with luck str() < str() should be ok, i've run this one:
nefertari.index -c docker.ini --models Label --params=_limit=13000 --chunk=500
And it worked.
Last one: I've a table with ~3,200,000 rows and my little server have only 15GB of RAM ... if I do some simple math: 2G for postgres, 2G for ES (a dump of postgres is about 2GB) ... well 10G for the re-indexing process seems fair ? well not at all ! I'll try to manage this lack of RAM with some more swap but I'm afraid of the time it'will need to do the job.
I hit a timeout Error above 1 million of rows, is there a way to increase the timeout limit ?
May I recommend to find a way to avoid search RAM comsumption ? =D I know my tables are not well optimized but maybe there is a way to process each chunks/offset/pages and release some RAM ? Again, I know it's hard to produce code and being "model-agnostic" ; maybe you just can't.
Hi,
I wrote a quick & dirty script to index all postgres data in ES whithout running out of memory and no matter the table size. As it's mostly inspired from the nefertari index script below the code snippet. It should work no matter your table design as it works with nefertari abstraction (NB: I'm using ramses too not tested whithout).
Features a "delete mapping" directive: removing a mapping delete all documents from an index.
I rely on ES for the indexation process: if I push an already known documents nothing should happen (correct me if i'm wrong) and it's really fast on ES.
Code here: https://github.com/Ezka77/nefertari-manage-index/blob/master/manage_index.py
Hi, I'm stuck in an issue with nefertari.index command: --recreate seems to work but it makes the indexation of only 10k rows.
I've tried to play with the --chunk option: doesn't change a thing (see exemple below).
I've tried to re-execute the command with --models on a ~12k rows table but:
Same command with a --chunk 1000 doesn't change a thing:
In the end with this exemple I have on postgres:
And from my API:
Which is not consistent =s
NB: A freeze of my env: