threefoldtech / home

Starting point for the threefoldtech organization
https://threefold.io
Apache License 2.0
9 stars 4 forks source link

GE: Production Artheon Reservation Failed at zos.billing.payout_farmers #824

Closed joefoxton closed 4 years ago

joefoxton commented 4 years ago

After some days of deploying test workloads, we felt confident today to reserve Artheon's 10TB workload for the remainder of 2020. At our call to zos.billing.payout_farmers we unfortunately saw this error:

Screenshot 2020-06-26 at 11 13 26

Our deployment code is here: https://gist.github.com/joefoxton/b7162304febd5dcc120bda2fc627003d

joefoxton commented 4 years ago

We have more than enough TFT for the reservation. Based on our calculations, we need about 7000 tokens. Our balance is here: https://stellarchain.io/address/GCH6A7NARAVXLGGMVP4TWRN37JWU37L7EM6OXAR6NP4S2CFFYACAN2IM

DylanVerstraete commented 4 years ago

Hi, normally kosmos should be printed a log explaining the error.

I think this error is due to the destination not having setup their trustlines.. To which address(es) are you trying to pay?

zaibon commented 4 years ago

Is this possible that you share the content of the result of the reservation registration ? So print the registered_reservation_zdbs variable after line 438

joefoxton commented 4 years ago

This is a reservation on the Salzburg Green Edge data centre. Worked fine yesterday with a one day reservation

On 26 Jun 2020, at 11:57 AM, Dylan Verstraete notifications@github.com wrote:

 Hi, normally kosmos should be printed a log explaining the error. I think this error is due to the destination not having setup their trustlines.. To which address are you trying to pay?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

joefoxton commented 4 years ago

I’m afraid the script stopped running as soon as this exception was thrown. We don’t know what to do when we catch the exception because it doesn’t seem possible to recover. Let me know if I’m missing something.

I’m the future we will print the out a detailed log of every call result.

On 26 Jun 2020, at 12:08 PM, Christophe de Carvalho notifications@github.com wrote:

 Is this possible that you share the content of the result of the reservation registration ? So print the registered_reservation_zdbs variable after line 438

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

zaibon commented 4 years ago

We don’t know what to do when we catch the exception because it doesn’t seem possible to recover. Let me know if I’m missing something.

You can catch the exception and inspect its content to try to gather more information. Stellar API documentation is there https://stellar-sdk-zh-cn.readthedocs.io/en/latest/_modules/stellar_sdk/exceptions.html#BadRequestError

Depending on the actual reason of the error, it might be possible to retry or not. But since this is a bad request, I'm suspecting sending exactly the same request will just fail as well.

joefoxton commented 4 years ago

Ok. Will edit the code as such for the future.

What is the next step to solve the problem?

On 26 Jun 2020, at 12:26 PM, Christophe de Carvalho notifications@github.com wrote:

 We don’t know what to do when we catch the exception because it doesn’t seem possible to recover. Let me know if I’m missing something.

You can catch the exception and inspect its content to try to gather more information. Stellar API documentation is there https://stellar-sdk-zh-cn.readthedocs.io/en/latest/_modules/stellar_sdk/exceptions.html#BadRequestError

Depending on the actual reason of the error, it might be possible to retry or not. But since this is a bad request, I'm suspecting sending exactly the same request will just fail as well.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

zaibon commented 4 years ago

this is the next step. We need to know what is the reason of the bad request error returns by stellar horizon network, so we can fix what's causing it.

joefoxton commented 4 years ago

Hey @zaibon @DylanVerstraete We put in more logging and exception handling. This is the only code change. But this time it worked. Non-deterministic computing! Very confusing and concerning. Any theories about what could have happened?

Anyway, now it looks like we have a nice 10TB reservation for Artheon. Fingers crossed.

joefoxton commented 4 years ago

I spoke too soon... our test worked, now we ran the reservation again with the correct dates and allocations. Now we see the exact same error.

So this MUST have something to do with the reservation duration, as this is the ONLY thing we changed since it fully worked.

@zaibon @DylanVerstraete

joefoxton commented 4 years ago

Maybe our reservation is too big for our wallet balance? Shouldn't this generate a nice error message though?

zaibon commented 4 years ago

Is this possible that you share the content of the result of the reservation registration ? So print the registered_reservation_zdbs variable after line 438

Can you do this, so we can see how much token is asked to be paid ?

Shouldn't this generate a nice error message though?

It should indeed.

joefoxton commented 4 years ago

Found the problem! We were trying to reserve more than we could afford :P

For sure it would be nice to display a nice error in this case. We will try to do this ourselves in the script too. Closing!

zaibon commented 4 years ago

agreed: https://github.com/threefoldtech/jumpscaleX_libs/issues/237