Open michalstepniewski opened 5 years ago
You're probably running out of memory. Can you try on a 64GB instance? Also, consider monitoring swap usage to make sure you're not thrashing.
On Fri, May 17, 2019 at 9:40 AM michalstepniewski notifications@github.com wrote:
I am running the code on AWS r5a.xlarge machine with | Deep Learning AMI (Ubuntu) Version 14.0 - ami-0089d61bf6a518044-- | --
The machine has 32GB RAM and 4 vCPUs. I run into:
2019-05-17 16:24:17.915477: W tensorflow/core/framework/allocator.cc:124] Allocation of 3328000000 exceeds 10% of system memory.
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
and the running seems to stall.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/in-silico-labeling/issues/3?email_source=notifications&email_token=AABXDBAYYAED7L5VHR5N76LPV3NYXA5CNFSM4HNWZJW2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUOGRRQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AABXDBGTJZBU7VKTMM5UPULPV3NYXANCNFSM4HNWZJWQ .
thanks for prompt reply. I am running on AWS on 61GB RAM and it's working so far so apparently 32GB is not enough :)
Thanks for the feedback, updated the README.
On Fri, May 17, 2019 at 11:13 AM michalstepniewski notifications@github.com wrote:
thanks for prompt reply. I am running on AWS on 61GB RAM and it's working so far so apparently 32GB is not enough :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/in-silico-labeling/issues/3?email_source=notifications&email_token=AABXDBAWS4KTLVYHPWKMXPTPV3YTDA5CNFSM4HNWZJW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVVPJFQ#issuecomment-493548694, or mute the thread https://github.com/notifications/unsubscribe-auth/AABXDBBSKI2HY7XQGAOKR23PV3YTDANCNFSM4HNWZJWQ .
I am running the code on AWS r5a.xlarge machine with | Deep Learning AMI (Ubuntu) Version 14.0 - ami-0089d61bf6a518044-- | --
The machine has 32GB RAM and 4 vCPUs. I run into:
2019-05-17 16:24:17.915477: W tensorflow/core/framework/allocator.cc:124] Allocation of 3328000000 exceeds 10% of system memory.
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
and the running seems to stall.