spcl / faaskeeper

A fully serverless implementation of the ZooKeeper coordination protocol.
BSD 3-Clause "New" or "Revised" License
17 stars 13 forks source link

Unexpected FaaSKeeperException exception Operation timed out after 5.0 [s] Running unit tests #26

Closed EricPyZhou closed 1 year ago

EricPyZhou commented 1 year ago

Hello,

given the client channel key error in the #24, I tried the deployment up until this commit: 80dc6e240534d9dafbd50dc52afe0e5b33a85155

because it shows it passed build.

Currently I deployed it with the aws.yml. The first question is that BucketName is specified in dataBucket in aws.yml, the deployed S3 bucket name still contained serverlessdeploymentbuck that was part a default name mentioned here. I gave it different name(${self:provider.environment.S3_BUCKET}-data2) and the name didn't have serverlessdeploymentbuck anymore.

The second question I encountered is when running the first unit test connect_session.py, it failed with logging Failed: Unexpected FaaSKeeperException exception Operation timed out after 5.0 [s]! looking at the trace, the error starts from here: https://github.com/spcl/faaskeeper-python/blob/e884fa546ecc63bbd5f5dfac4e252752df8193c8/faaskeeper/client.py#L164

might because I installed the faaskeeper client on the master branch of https://github.com/spcl/faaskeeper-python, and it is not compatible with the version at 80dc6e240534d9dafbd50dc52afe0e5b33a85155.

So I will wait for the fix you mentioned win the #24, and deploy the instance and run unit tests

mcopik commented 1 year ago

@EricPyZhou Hi Eric!

(1) Interesting observation - I see the following two buckets created when I deploy the stack. The first one is the actual data bucket, while the other one is the bucket from Serverless Framework. I agree that renaming the deployment bucket might be very helpful, but I'm not sure if you did not observe this issue with the S3 data bucket?

image

(2) Indeed, it is an issue - our documentation does not explain properly that the client machine needs a public IP to accept connections from Lambda functions. I am working now on replacing this with SQS - it should be merged soon with master.

EricPyZhou commented 1 year ago

@EricPyZhou Hi Eric!

(1) Interesting observation - I see the following two buckets created when I deploy the stack. The first one is the actual data bucket, while the other one is the bucket from Serverless Framework. I agree that renaming the deployment bucket might be very helpful, but I'm not sure if you did not observe this issue with the S3 data bucket?

image

(2) Indeed, it is an issue - our documentation does not explain properly that the client machine needs a public IP to accept connections from Lambda functions. I am working now on replacing this with SQS - it should be merged soon with master.

Hi, Marcin

(1) yes, I did observe the issue of creating two bucket, one with regular name, the other one with serverlessdeploymentbucket.

It happened when using the master branch as well.

If I found anything might be related, I will mention it in the proposal as well.

(2) got it, thanks!

mcopik commented 1 year ago

(1) The second bucket is an artifact of the deployment with the Serverless Framework, it will always be there. However, your link indicates that it is possible to override its name with deploymentBucket - it might be a very simple fix. Would you be interested in opening a fix for that?

EricPyZhou commented 1 year ago

(1) The second bucket is an artifact of the deployment with the Serverless Framework, it will always be there. However, your link indicates that it is possible to override its name with deploymentBucket - it might be a very simple fix. Would you be interested in opening a fix for that?

Will do.

UPDATE: I think it's fine keeping the name as what the default is because we can look up documents and realize that is a bucket for severless it self to use.

But this can lead to an issue of exceeding 100 buckets (only a soft limit). https://docs.aws.amazon.com/AmazonS3/latest/userguide/BucketRestrictions.html

I can create a small pr updating documents to summarize things we discussed so that newcomers like me understand what's going on.

mcopik commented 1 year ago

@EricPyZhou I started a PR in #29 fixing issues - this should resolve the issue with the failing deployment.

We are still missing the SQS channel - right now, it is required that the machine from which you run FK operations has a public IP and accepts incoming TCP connections - a reasonable assumption for a server but too restrictive for development. This is likely why you experience the issue "timed out after 5 seconds".

EricPyZhou commented 1 year ago

@EricPyZhou I started a PR in #29 fixing issues - this should resolve the issue with the failing deployment.

We are still missing the SQS channel - right now, it is required that the machine from which you run FK operations has a public IP and accepts incoming TCP connections - a reasonable assumption for a server but too restrictive for development. This is likely why you experience the issue "timed out after 5 seconds".

Thanks! Then im thinking about creating a EC2 instance so that there is a public IP address to test with. For now, I will focus on the proposal