Closed APSEPHION closed 7 months ago
hi @APSEPHION,
I tried to access the snapshot, and it seems to be available
Did you use the S3 keys in the guide?
Also, could you try setting the S3 endpoint in elasticsearch.yml
? i.e.:
As for examples, I have sample data in parquets and a sample script to upload them to your own Elasticsaerch service https://github.com/rayliuca/T-Ragx/blob/131068827f3c2664e36957bd0f6e65d9bd981ffb/src/t_ragx/scripts/build_demo_elastic_memory_index.py#L10-L23
By default, the name of the index is translation_memory
, as you can see here:
https://github.com/rayliuca/T-Ragx/blob/131068827f3c2664e36957bd0f6e65d9bd981ffb/src/t_ragx/processors/ElasticInputProcessor.py#L124-L126
But you could use any names for the index (i.e. translation_memory_demo
in the demo script)
In my implementation of the index, it has:
_id: hash of based on the original text, used for dedupe
lang_code_1: some text
lang_code_2: some text
lang_code_3: some text
corpus: the corpus where the data is from (as metadata)
id_key: the lang_code of the source text that the hash was calculated on
For example:
{
"_id": "01ceca8f3c917331867c1b922cc905c63c1a9abd",
"en": "Then I chatted with the villagers.",
"ja": "それから村びとと話し合いました。",
"zh": "於是我開始和村人聊天。",
"corpus": "NLLB",
"id_key": "ja"
}
There could be an arbitrary number of languages associated with the record. T-Ragx would search based on the source lang code specified and return the best records with the target lang code
Please let me know if you can access the snapshot
Thanks for the quick response. I configured the keys and endpoint as in your guide. After some more research, I found that the DNS nameserver in the container seems to be misconfigured. It fails to resolve every address I try. I'll look into that, but I don't think it's related to T-Ragx. Kind of weird, because other containers work...
Thanks for the examples! If nothing else works, I'll DIY it.
Ok, I fixed my networking. My new error is
{"error":{"root_cause":[{"type":"repository_exception","reason":"[public_t_ragx_translation_memory] Could not determine repository generation from root blobs"}],"type":"repository_exception","reason":"[public_t_ragx_translation_memory] Could not determine repository generation from root blobs","caused_by":{"type":"i_o_exception","reason":"Exception when listing blobs by prefix [index-]","caused_by":{"type":"amazon_s3_exception","reason":"The Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 17C805C44797C21A; S3 Extended Request ID: 894a7c6c-f598-4b31-807c-acb85d440dfd; Proxy: host)"}}},"status":500}
I configured the access keys as provided here
bin/elasticsearch-keystore add s3.client.default.access_key
CG4KwcrNPefWdJcsBIUp
bin/elasticsearch-keystore add s3.client.default.secret_key
Cau5uITwZ7Ke9YHKvWE9cXuTy5chdapBLhqVaI3C
hmm... it seems that it's calling Amazon S3 instead of using the custom endpoint. Did you change the config as I suggested?
Also, could you try setting the S3 endpoint in elasticsearch.yml? i.e.: https://github.com/rayliuca/T-Ragx-Fossil/blob/044e3d7c7cd824ebabadf2293dc256b74b8c4ed9/elastic_config/elasticsearch.yml#L99
If you are using Docker, I would suggest you take a look at https://github.com/rayliuca/T-Ragx-Fossil, which is the docker-compose
setup I'm using for the public services right now (even though the folder permission is a bit messed up right now, see the debug section in the README for that repo)
Thanks again! Fossil was not working for me, but after some struggle I found that deleting elk_data/elasticsearch/plugins/.installer.<base64_here>
helped. I originally added your repository, but it seems like my Dockerfile was flawed and rebuilding removed it from the config... . After getting Fossil to run, the curl command to add the snapshot returned:
{"error":{"root_cause":[{"type":"repository_verification_exception","reason":"[public_t_ragx_translation_memory] path [elastic] is not accessible on master node"}],"type":"repository_verification_exception","reason":"[public_t_ragx_translation_memory] path [elastic] is not accessible on master node","caused_by":{"type":"i_o_exception","reason":"Unable to upload object [elastic/tests-Vyw5DPImSxybEcd1Z2wvag/master.dat] using a single upload","caused_by":{"type":"amazon_s3_exception","reason":"Access Denied. (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 17CA27A1288959A0; S3 Extended Request ID: 894a7c6c-f598-4b31-807c-acb85d440dfd)"}}},"status":500}
BUT after checking in elasticvue I was able to find the snapshot repository and begin to restore it. I'm at 9GB, just how large is it? I also had to disable xpack.security
for elasticvue since I could not find the credentials...
Ok, restoring worked!
Glad it worked!
For future visitors:
xpack.security
setting, you would need to generate the password by docker exec -it
into the container and follow the Elastic documentation. The repo doesn't have any password built-in, for security. It's ok to set the xpack.security
config to false
in the elasticsearch.yml
file if you don't care
Hi, I really liked playing around with T-Ragx and am now trying to self-host. I am trying to set up a local instance of the translation memory according to the guide. However, I can't seem to access
o3t0.or.idrivee2-37.com
. Full error message:Is the service still available? If not, could you provide documentation on what indexes are required (and sample data, maybe in CSV format). Or could the error be on my side?