Open ThibautSF opened 2 years ago
I remember AWS OpenSearch had some different auth mechanisms but as normal operations do work, not sure if this is related. It seems you don't get much about in the logs, I would look at the logs of Elasticsearch / OpenSearch to see if you see something there.
Side note: Elastica does not support and is not tested against OpenSearch but in 7.10 I would expect the APIs to still be mostly aligned.
I remember AWS OpenSearch had some different auth mechanisms but as normal operations do work, not sure if this is related. It seems you don't get much about in the logs, I would look at the logs of Elasticsearch / OpenSearch to see if you see something there.
I will search how to activate logs then because looks like they aren't on by default on AWS. But the strange things I found :
429 Too Many Requests /_bulk
, looks like logs might help me for that too)Side note: Elastica does not support and is not tested against OpenSearch but in 7.10 I would expect the APIs to still be mostly aligned.
This is for this specific reason I set up an Elasticsearch 7.10 and not OpenSearch 1.2 I wanted to test my implementation with normal ElasticSearch first (then in a second time try an OpenSearch instance)
The bulk request issue is odd. I would have expected a different error if the bulk request is too large. Do you have lots of other traffic on this instance? As you said, ES logs should help you in this scenario too.
You also not above, the elsadmin
is a super user so I don't see how you should get problems with aliases :-( Have you tried to just run Elasticsearch locally on your machine and run the same code to see what happens?
Still trying to obtain AWS logs, but it doesn't give anything yet.
You also not above, the
elsadmin
is a super user so I don't see how you should get problems with aliases :-( Have you tried to just run Elasticsearch locally on your machine and run the same code to see what happens?
Local run works fine (although I only have HTTP and no users...)
I tried queries with postman to https://elsadmin:<pass>@<elasticdomain>:443/_aliases
And if I use the "*" wildcard in the remove action it generates the "no permissions for []" error
BUT if I use direct old indice name :
So looks like it confirms the fact that the issue is AWS side... I also made tests (but several months ago) with Elastic Cloud 30days demo instance and almost same code (and strictly same code for alias query part) was working on it.
The bulk request issue is odd. I would have expected a different error if the bulk request is too large. Do you have lots of other traffic on this instance? As you said, ES logs should help you in this scenario too.
I still need to deep down that part, but.. even if it's the smallest AWS instance available for OpenSearch (because it's only for dev tests) my local can handle more documents (by number and byte size) at one time with less RAM and nodes (1Go for 1 node (1 shard) against 2Go/nodes for 3 nodes (3 shards)) (AWS cluster from Elasticview firefox addon view)
Ok managed to find a workaround.
I made some more documentation read for _aliases
I suppose that the wildcard *
was affecting hidden protected indices (security & co).
Since the implementation just adds the base name of the indices as alias I will override the class method in elastically and change the pattern from "*"
to "my_index_base_name*"
And this way it works.
Glad you found a workaround:
I suppose that the wildcard * was affecting hidden protected indices (security & co).
I remember there was a bug in some of the 7.x releases in Elasticsearch. Would be interesting to know if it works with 7.17.
On the bulk request: Even though your instances are small and above 80% memory, I would still expect you can ingest more then 5 docs in a bulk. Are these especially large docs?
I remember there was a bug in some of the 7.x releases in Elasticsearch. Would be interesting to know if it works with 7.17.
Sadly AWS OpenSearch service is limited to 7.10, after that it's OpenSearch 1.2 (or manual cluster creation). But I should be able to try older versions
On the bulk request: Even though your instances are small and above 80% memory, I would still expect you can ingest more then 5 docs in a bulk. Are these especially large docs?
Those are not large docs, those are docs containing attachments files (pdf, images, excel, etc...). Each bulk is created based on 2 metrics :
min((int) (ini_get('post_max_size')), (int) (ini_get('upload_max_filesize'))) * 1.0e+6
Before the document is added to the bulk, I check if a metric is reached and in that case, I send the bulk and then create a new bulk.In my practice case, the 6 documents sent are below 1MB:
Those are not large docs, those are docs containing attachments files (pdf, images, excel, etc...).
Agree these are not large documents but still different from just JSON payload. If you don't use the attachments, to larger bulk requests go through?
Hi,
Took some time and fixes on my index requests script and made several tests + debugs. And on each tests I reduced my upload max byte size parameter here is for 20MB
Flush 1
Queue size : 1/100
Queue bytes size : 8586291/20000000
array(4) { ["took"]=> int(513) ["ingest_took"]=> int(8662) ["errors"]=> bool(false) ["items"]=> array(1) { [0]=> array(1) { ["index"]=> array(9) { ["_index"]=> string(97) "45a88cff4bbf0973e254c6e87c0a971a76d812187f3376aa04eb9d121756b031_eln_pageattach_2022-05-03-111557" ["_type"]=> string(4) "_doc" ["_id"]=> string(64) "59f855d347347c5fc730fed4bce741e255f07ec0d4d5f0d466659c0abc9f25c3" ["_version"]=> int(1) ["result"]=> string(7) "created" ["_shards"]=> array(3) { ["total"]=> int(1) ["successful"]=> int(1) ["failed"]=> int(0) } ["_seq_no"]=> int(2) ["_primary_term"]=> int(1) ["status"]=> int(201) } } } }
Flush 2
Queue size : 1/100
Queue bytes size : 15550231/20000000
array(1) { ["message"]=> string(28) "429 Too Many Requests /_bulk" }
Flush 3
Queue size : 10/100
Queue bytes size : 16879040/20000000
array(1) { ["message"]=> string(28) "429 Too Many Requests /_bulk" }
And if I reduce it to 15MB all requests work because :
But I need to recheck how my requests are created. Because the "Flush 2" which is 15,5MB contains a file of 8,8MB. And encoded base64 it should take around 11,7MB (33% increase). Se even if I pack some more data with the file, which are really basic data like file name, some ids... 3-4MB difference looks huge so I might have some unwanted data sent.
OR it's my method to calculate the bulk size which is wrong...
getBytesSize($currentBulk->getActions()); //Note: see edit bellow
function getBytesSize($arr): int
{
$tot = 0;
if (is_array($arr)) {
foreach ($arr as $a) {
$tot += getBytesSize($a);
}
}
if (is_string($arr)) {
$tot += strlen($arr);
}
if (is_int($arr)) {
$tot += PHP_INT_SIZE;
}
if (is_object($arr)) {
$tot += strlen(serialize($arr));
}
return $tot;
}
EDIT : OK calling getBytesSize((string) $currentBulk);
instead looks to give a much better approximation value of the plyload.
Hi,
I have an error that I didn't manage to understand (and might not be completely related to Elastica, or maybe I have an issue with a config somewhere...).
Intro
I have the following ES environment :
For PHP :
The issue
Every other queries (index creation, indexation, search, etc) works but when I'm trying to remove & create aliases I get an 500
Note: the user
elsadmin
indicated is admin with full access to everythingThe client is initialized with config
Alias query call