Open jayminkapish opened 3 years ago
Hi @jayminkapish what version of Elasticsearch do you have in production? I wonder from that error message if you might be running into a problem like the one described at https://github.com/elastic/elasticsearch/issues/50670.
You may want to compare the results of a call to /index/_mapping
in staging and production to see if the particular data in production is causing many dynamic mapping updates.
This page may also be helpful https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html
I should've provided these earlier:
INFO 2021/05/07 21:50:52 Started monstache version 6.7.5
INFO 2021/05/07 21:50:52 Go version go1.15.5
INFO 2021/05/07 21:50:52 MongoDB go driver v1.5.1
INFO 2021/05/07 21:50:52 Elasticsearch go driver 7.0.23
INFO 2021/05/07 21:50:52 Successfully connected to MongoDB version 4.4.5
INFO 2021/05/07 21:50:52 Successfully connected to Elasticsearch version 7.10.2
INFO 2021/05/07 21:50:52 Listening for events
INFO 2021/05/07 21:50:52 Watching changes on the deployment
Thanks for the pointers and will look at them. I am assuming you found our toml config just fine to sync entire mongo collection onto the elasticsearch cluster.
Yes production data size is pretty big compared to staging and there are many many unique fields (the way mongo collection is designed). This must be adding to the dynamic mapping timeouts. We had similar issue with mongo-connector but adjusting bulk size got us to the finish line.
We are going to try limiting the bulk size via elasticsearch-max-bytes
. We may try 2MB (in bytes) and see if we're running into dynamic mapping timeouts.
I think we are just looking for ways to slow down monstache for the initial sync.
@jayminkapish I'm having the same issue. looks like it's taking forever to sync the data. @rwynn any opinion?
stats
{
"Flushed": 280,
"Committed": 380,
"Indexed": 403,
"Created": 0,
"Updated": 0,
"Deleted": 0,
"Succeeded": 403,
"Failed": 0,
"Workers": [
{
"Queued": 0,
"LastDuration": 7000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 6000000
},
{
"Queued": 0,
"LastDuration": 6000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 7000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 4000000
}
]
}
INFO 2021/06/18 11:28:27 Started monstache version 6.7.5
INFO 2021/06/18 11:28:27 Go version go1.15.5
INFO 2021/06/18 11:28:27 MongoDB go driver v1.5.1
INFO 2021/06/18 11:28:27 Elasticsearch go driver 7.0.23
INFO 2021/06/18 11:28:27 Successfully connected to MongoDB version 4.4.2
INFO 2021/06/18 11:28:27 Successfully connected to Elasticsearch version 7.13.1
INFO 2021/06/18 11:28:27 Listening for events
INFO 2021/06/18 11:28:27 Sending systemd READY=1
WARN 2021/06/18 11:28:27 Systemd notification not supported (i.e. NOTIFY_SOCKET is unset)
INFO 2021/06/18 11:28:27 Starting http server at :8080
INFO 2021/06/18 11:28:27 Watching changes on the deployment
INFO 2021/06/18 11:28:27 Resuming stream '' from collection monstache.tokens using resume name 'default'
INFO 2021/06/18 11:28:27 Direct reads completed
Config:
gzip = true
stats = true
index-stats = true
replay = false
resume = true
resume-strategy = 1
index-files = false
verbose = true
direct-read-split-max= 20
exit-after-direct-reads = false
enable-http-server = true
elasticsearch-max-conns = 10
**other's env**
MONSTACHE_MONGO_URL: "mongodb://10.10.5.37:5557"
MONSTACHE_ES_URLS: "http://my-es-http:9200"
MONSTACHE_ES_USER: "elastic"
MONSTACHE_ES_PASS: "0UDl04649LxkK0GE9"
MONSTACHE_DIRECT_READ_NS: "mydb.admin,mydb.area,mydb.banner,mydb.brand,mydb.campaing,mydb.category,mydb.city,mydb.deliveryCharge,mydb.deliveryTime,mydb.division,mydb.product,mydb.region,mydb.reward,mydb.shop,mydb.slider,mydb.sliderItem,mydb.user,mydb.version"
We've paused sync since my last comment. We're hoping to resume this work in July.
@rwynn any updates on this?
@asmaaelk can you describe the error or behavior you are seeing?
Are you also receiving errors like timed out while waiting for a dynamic mapping update
?
If so it is best to map your data explicitly using index templates.
Monstache has been working really well for us in the staging environment past couple of weeks. It gave us a lot of excitement syncing 23K documents from staging database to staging elasticsearch cluster very quickly (< 10m). We then moved the deployment to production 3 days ago with the same configuration toml file as staging except the production collection size is very big. COLLECTION SIZE: 4.97GB and TOTAL DOCUMENTS: 713458
We are looking to sync entire mongo collection onto the elasticsearch cluster and then tail the oplog.
And we have the following env vars:
Monstache kicked off sync at higher rate but after just a few hours it seems to be stalling and only indexing 2-5 documents a minute. It logged
Direct reads completed
after about 6h.and stats timer logs stats
Monstache also logged the following error about 800 times in the first few hours of production sync kick off:
We've allocated 2 CPUs and 4gb memory to the monstache and it is hardly using 2% of it at the moment.
Can you tell us looking at the config what can we do to speed up the sync?
Thanks in advance.