Closed venera70 closed 2 months ago
The logging suggests that you are using a WRONG password to access Redis.
If it is a WRONG password for Redis/MemoryDB, then why is it we were still able to list the contents of the mountpoint and even write to it (see the output of dd
command above)?
For example, when the MemoryDB password does expire (IAM authentication) we will get:
user@ec2-host:/test$ ls
ls: reading directory '.': Input/output error
user@ec2-host::/test$
but that wasn't the case when those WRONGPASS errors were appearing in the logs.
More on the IAM authentication method for MemoryDB: We are using the IAM user of MemoryDB https://docs.aws.amazon.com/memorydb/latest/devguide/auth-iam.html where by a dynamically generated IAM token is used as the redis password.
From our discussions with AWS, while the IAM token is valid to use as a password within 15 minutes of obtaining it, once the connection to MemoryDB is established, it would remain active until the expiration of the IAM role of the credentials used. Additionally, we used an assumed role for which the credentials can last for a maximum of 12 hours because the instance profile credentials (from curl http://169.254.169.254/latest/meta-data/iam/security-credentials/MyEc2IAMRole/
) is only valid for 6 hours.
We are generating the MemoryDB (which requires AWS SigV4 signing of various parameters) password from the following steps:
Using Python boto3 lib with the code sample here: https://docs.aws.amazon.com/memorydb/latest/devguide/LambdaMemoryDB.step2.html#LambdaMemoryDB.step2.1, modified to use boto3.assume_role() in the class like so:
sts_client = boto3.client('sts')
assumed_role = sts_client.assume_role(
RoleArn=self.iam_role,
DurationSeconds=self.role_duration,
RoleSessionName='MyMemoryDBRole',
)
# Get the assumed role credentials
self.credentials = assumed_role['Credentials']
session = boto3.Session(
aws_access_key_id=self.credentials['AccessKeyId'],
aws_secret_access_key=self.credentials['SecretAccessKey'],
aws_session_token=self.credentials['SessionToken'],
)
from urllib.parse import quote
):
return quote(signed_url.removeprefix("https://"), safe='')
rediss://redisuser:myrediscluster%2F%3FAction%3Dconnect%26User%3Dredisuser%26X-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Credential%3DASIASTQGD3GMLGAIE903%252F20240903%252Faws-region%252Fmemorydb%252Faws4_request%26X-Amz-Date%3D20240903T113520Z%26X-Amz-Expires%3D900%26X-Amz-SignedHeaders%3Dhost%26X-Amz-Security-Token%3DFwoGZXIvYXdzELX%252F%252F%252F%252F%252F%252F%252F%252F%252F%252FwEaDEIVKGkLU4EVPByY9SK1AaBao61ZLOMnkL8h8tx3JD3MyZNnbKOW1j5iS%252Bf5n7DYGO7T9HGY8njpqjrfJQCJwo3J4aCxhMx8Ff4J3c%252FaaNPQ%252FBI%252BIoUzFgtHemzkYpAUKQOv1uQjHYNJIAETrse98%252FiM4f7pD%252BvB%252FNCueSkDDmXSKUB8G0pDc%252Fm4DTjbDfp21T8zdO%252Bu3rN%252BgR7YASokTIjD%252F34m%252By9pZ9ZgxVucODr7XgFVaD%252BgE1%252BgGsZt8KdRijrNumco%252BOjbtgYyLZlkPFWedwE92AU1UD5ooYzQedr9bc5dqpKptxwNjtyMVVy9XlFIVyALXSi0xA%253D%253D%26X-Amz-Signature%3Dc5dbbc42e96cc3f987e11747b479572b26ce90827d4262a4201fddb48102ecd5@clustercfg.myrediscluster.asdfghj.memorydb.aws-region.amazonaws.com:6379/0
Question: Does JuiceFS open any more new connections to Redis at times other than the first time the mount, or does it reuse the connection established when the instant the mount connection was issued?
If JuiceFS does indeed try to establish new connections when read/writes increase, that could explain why the password is no longer valid, if more than 15 minutes have elapsed from the time the mount command was issued.
Is there a way we can make JuiceFS spawn multiple Redis connections (that are kept alive, so that MemoryDB doesn't terminate them) at the start so that read/write processes could just reuse these opened connections, rather than opening new ones on demand?
Thank you! P.S. JuiceFS rocks! We see significant performance gains over AWS EFS.
The go-redis use a pool of connections to access MemoryDB, so the additional connections are created based on request/workload. A single connection can only handle a request/response at a time.
We could ask go-redis for this feature, but even with that, I think it will still be easy to run into this problem.
What happened: We perform git clone of a large opensource codebase with hundres of thousands of files into a juicefs filesystem. The checkouts fail towards the end with the following git error:
fatal: cannot pread pack file: Input/output error fatal: fetch-pack: invalid index-pack output
In the juicefs client logs, there are multiple occurrences of the following:The
WRONGPASS
error seems to indicate that it is related to redis access but the juicefs mountpoint is fully accessible:and subsequently
head
-ing the file shows the binary data written What you expected to happen: Checkout should complete without errors.How to reproduce it (as minimally and precisely as possible): Clone an OSS project with many small files and long history (e.g. linux kernel, gcc or llvm) into a juicefs system with the mount flags detailed in the environment below.
Anything else we need to know? Not sure how relevant but the EC2 clients that access the juicefs mount will be writing to the same volume but not the same set of files. Each client would write to its own subdirectory in the mount at any given time. Also, juicefs binary is running as root (via sudo)
Environment:
juicefs --version
) or Hadoop Java SDK version:arm64 1.2.0+2024-06-18.873c47b
self compiled as static binaryjuicefs version 1.2.0+2024-06-18.873c47b
with the following mount options:juicefs mount --no-usage-report --writeback --no-syslog --background --cache-dir=/tmp/juicefs_cache --cache-size=1024000 --buffer-size=1024 --max-uploads=40 --log=/var/log/juicefs.log rediss://ciagent:****@clustercfg.abc.def.memorydb.region.amazonaws.com
cat /etc/os-release
):PRETTY_NAME="Ubuntu 24.04.1 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.1 LTS (Noble Numbat)" VERSION_CODENAME=noble ...
uname -a
):Linux ip-10-252-53-17 6.8.0-1014-aws #15-Ubuntu SMP Thu Aug 8 20:05:03 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux