vision-dbms / vision-xa-nodejs-connect

A node.js native add-on for the Vision database management system.
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Node dies with pstack/NDF.ERROR #3

Open LeslieNewman opened 4 years ago

LeslieNewman commented 4 years ago

Good way to start a week. I ran a simple test suite to make sure all my environments were set correctly and managed to get a hang/pstack.

What I saw:

----------------------------------------------------------------
***** 28D86CD6.37A41089.00000000 Mon Jul 27 07:45:46 2020 lnewman [252 252 2007 2007]
>>>  Error Log Entry[1]  <<<
*       VTransientServices [165]
*       An External Interface Error
*       Object Disconnected

----------------------------------------------------------------
***** 28D86CD6.37A41089.00000000 Mon Jul 27 07:45:55 2020 lnewman [252 252 2007 2007]
>>>  Error Log Entry[2]  <<<
*       The Signal Handler [251]
*       A Segmentation Fault
*       Segmentation Violation Signal
#0  0x00007f955630346c in waitpid () from /lib64/libc.so.6
#1  0x00007f9556280f62 in do_system () from /lib64/libc.so.6
#2  0x00000000005d448a in SIGNAL_StandardSignalHandler (sig=<optimized out>, code=<optimized out>, scp=<optimized out>
) at ./vsignal.cpp:151
#3  <signal handler called>
#4  0x0000000002222450 in ?? ()
#5  0x0000000001f83c80 in ?? ()
#6  0x00007f955858a1e3 in Vca::VDevice::Face::User::finish (this=0x1ebe880, this@entry=0x1f83c80, rStatus=...) at ../k
ernel/Vca_VDevice.cpp:294
#7  0x00007f95585df256 in Vca::VStreamSink::transferData (this=0x1f83ab0) at ../kernel/Vca_VStreamSink.cpp:78
#8  0x00007f95584fdfc2 in Vca::VBS::onPutContinuation (this=this@entry=0x1f83ab0) at ../kernel/Vca_VBS.cpp:579
#9  0x00007f95584fe09a in Vca::VBS::putBufferedData (this=this@entry=0x1f83ab0) at ../kernel/Vca_VBS.cpp:371
#10 0x00007f95584f6578 in Vca::VBSConsumer::PutBufferedData () at ../kernel/Vca_VBSConsumer.cpp:391
#11 0x00007f9558570cb5 in Vca::VCohort::drainOutputQueue (this=this@entry=0x1ec81c0) at ../kernel/Vca_VCohort.cpp:890
#12 0x00007f9558571948 in Vca::VCohort::drainQueues (this=this@entry=0x1ec81c0) at ../kernel/Vca_VCohort.cpp:896
#13 0x00007f95585719bf in Vca::VCohort::processEvents (this=0x1ec81c0, pEM=pEM@entry=0x7ffc96475db0, sTimeout=1000, rb
EventsProcessed=rbEventsProcessed@entry=@0x7ffc96475daf: false) at ../kernel/Vca_VCohort.cpp:773
#14 0x00007f9558572035 in Vca::VCohort::Manager::doEvents (this=this@entry=0x7ffc96475db0, sTimeout=<optimized out>, r
bEventsHandled=@0x7ffc96475daf: false) at ../kernel/Vca_VCohort.cpp:443
#15 0x000000000053ddcb in VComputationScheduler::Manager::DoEverything (this=0x1f82330) at ./VComputationScheduler.cpp
:423
#16 0x00007f95573f4a45 in Vca::VMainProgram::processEvents (this=this@entry=0x7ffc96476010) at ../kernel/Vca_VMainProg
ram.cpp:528
#17 0x0000000000467c0e in main (sArgv=<optimized out>, pArgv=<optimized out>) at ./batchvision.cpp:862
LeslieNewman commented 4 years ago

More details, consolidated here:

One of the key functions provided by the node.js module is to enable the vision user to create and access external JavaScript dictionary objects. We have situations where the node process disappears. This can be accompanied by a batchvision that hangs and ultimately needs to be killed. Sometimes there is a pstack; sometimes there is a core dump. There are usually External object errors in the form:

>>> External Object Selector '.id' Implementation Failure Error: <<<
>>> Object Disconnected <<<

Accompanied by entries in the NDF.ERRORS file:

----------------------------------------------------------------
***** 28EF5B0E.15AC4B77.00000000 Wed Aug 19 18:38:29 2020 lnewman [252 252 2007 2007]
>>>  Error Log Entry[1]  <<<
*       VTransientServices [165]
*       An External Interface Error
*       Object Disconnected

----------------------------------------------------------------
***** 28EF5B0E.15AC4B77.00000000 Wed Aug 19 18:38:36 2020 lnewman [252 252 2007 2007]
>>>  Error Log Entry[2]  <<<
*       The Signal Handler [251]
*       A Segmentation Fault
*       Segmentation Violation Signal

When present, a typical pstack contains:

#0  0x00007fa6ecdb946c in waitpid () from /lib64/libc.so.6
#1  0x00007fa6ecd36f62 in do_system () from /lib64/libc.so.6
#2  0x00000000005d448a in SIGNAL_StandardSignalHandler (sig=<optimized out>, code=<optimized out>, scp=<optimized out>
) at ./vsignal.cpp:151
#3  <signal handler called>
#4  0x00000000023d6320 in ?? ()
#5  0x0000000001bf2c80 in ?? ()
#6  0x00007fa6ef0401e3 in Vca::VDevice::Face::User::finish (this=0x1b2d880, this@entry=0x1bf2c80, rStatus=...) at ../k
ernel/Vca_VDevice.cpp:294
#7  0x00007fa6ef095256 in Vca::VStreamSink::transferData (this=0x1bf2ab0) at ../kernel/Vca_VStreamSink.cpp:78
#8  0x00007fa6eefb3fc2 in Vca::VBS::onPutContinuation (this=this@entry=0x1bf2ab0) at ../kernel/Vca_VBS.cpp:579
#9  0x00007fa6eefb409a in Vca::VBS::putBufferedData (this=this@entry=0x1bf2ab0) at ../kernel/Vca_VBS.cpp:371
#10 0x00007fa6eefac578 in Vca::VBSConsumer::PutBufferedData () at ../kernel/Vca_VBSConsumer.cpp:391
#11 0x00007fa6ef026cb5 in Vca::VCohort::drainOutputQueue (this=this@entry=0x1b371c0) at ../kernel/Vca_VCohort.cpp:890
#12 0x00007fa6ef027948 in Vca::VCohort::drainQueues (this=this@entry=0x1b371c0) at ../kernel/Vca_VCohort.cpp:896
#13 0x00007fa6ef0279bf in Vca::VCohort::processEvents (this=0x1b371c0, pEM=pEM@entry=0x7ffc08c33860, sTimeout=1000, rb
EventsProcessed=rbEventsProcessed@entry=@0x7ffc08c3385f: false) at ../kernel/Vca_VCohort.cpp:773
#14 0x00007fa6ef028035 in Vca::VCohort::Manager::doEvents (this=this@entry=0x7ffc08c33860, sTimeout=<optimized out>, r
bEventsHandled=@0x7ffc08c3385f: false) at ../kernel/Vca_VCohort.cpp:443
#15 0x000000000053ddcb in VComputationScheduler::Manager::DoEverything (this=0x1bf1330) at ./VComputationScheduler.cpp
:423
#16 0x00007fa6edeaaa45 in Vca::VMainProgram::processEvents (this=this@entry=0x7ffc08c33ac0) at ../kernel/Vca_VMainProg
ram.cpp:528
#17 0x0000000000467c0e in main (sArgv=<optimized out>, pArgv=<optimized out>) at ./batchvision.cpp:862

We have seen this behavior running large queries using the new bridge protocol and new proxy fql and screening services. Initially, we assumed the problem was associated with this type of query and there may in fact be issues with the large queries as well; however, we have added print statements to some of those jobs that help confirm that the problem is happening in setting data in or getting data from the external JavaScript object and not directly tied to the external web service call. We were able to create a reproducer that merely creates an object, then sets and gets values in and from that object. This relatively simple query works most of the time; however, when we run it with the ?r10000 directive (instead of the normal ?g), it almost always fails. We have used the version 10.13.0 version of node (current) and the version 12.18.2 version of node (inprog) with equivalent resutls.

To reproduce:

• Start a plain vanilla vision session

• Load node.vis file
    → /home/user/lnewman/opensource/fdsbridge/vscripts/node.vis

• In your vision session, connect to server and initialize

    !jsConnect <- JS serverByName: "Node_Fetch" ;
    !jsObject <- jsConnect newObject ;
    !x <- 1;

• In your vision session, update, access, and create new java script objects and run a large number of times (?r10000)

    newLine print ;
    "iteration: " print ; x printNL ;
    :x increment  ;

    "--- display jsObject" printNL ;
    jsObject set: "keyString" to: "this is a string" . set: "keyNumber" to: 123.45 . ;
    jsObject display;

    jsObject get: "keyString" . printNL ;

    jsObject
       set: "ids" toArrayFrom: ("FDS", "IBM", "MSFT") ;
    jsObject display ;

    jsObject get: [ ids ] . display ;

    jsObject get: "ids" . get: 0 . printNL ;

    "---  walk through ids based on js lenght of array" printNL ;
    !ids <- jsObject get: "ids" ;
    !length <- ids get: [ length ] ; #<-- this is a JS function
    length sequence0
    do: [ print ; ^my ids get: ^self . printNL ] ;

    "-- build new object of objects" printNL ;
    !idInputs <- jsConnect newObject
    set: "type" to: "Security" .
    set: "ids" toArrayFrom: ("FDS", "IBM", "MSFT") ;
    "--idInputs created" printNL ;

    !settingInputs <- jsConnect newObject
    set: "curr" to: "USD" .
    set: "adjust" to: "NO" ;
    "--settingInputs created" printNL ;

    !requestObject <- jsConnect newObject
    set: "inputs" to: idInputs .
    set: "settings" to: settingInputs ;

    "--- Display requestObject: " printNL ;
    requestObject display ;

    "--- Display requestObject settings: " printNL ;
    requestObject get: "settings" . display ;

    "--- Display requestObject settings adjust: " printNL ;
    requestObject get: "settings" . get: "adjust" . printNL ;
    requestObject get: [ settings adjust ] . printNL ;

    "--- Display requestObject inputs ids 1" printNL ;
    requestObject get: "inputs" . get: "ids" . get: 1 . printNL ;
    requestObject get: [ inputs ids at: 1 ] . printNL ;

    ?r10000

Random Observations:

• When we get snf's, all roads lead to the JS set: of: to: message  :

     >>> External Object Selector 'Reflect' Implementation Failure Error: <<<
     >>> External Object Selector '.id' Implementation Failure Error: <<<

Charlie Tips:

When there is a core dump, we can apparently get some useful information from it using gbd if we are Charlie:

gdb -c core.1596814604_252_2007_11_11865_node -d $FDSBridgeArea/software/inprog/vision-xa-nodejs-connect/ -d /fast/software/20200717/node-v12.18.2/ -d $FDSBridgeArea/software/inprog/vision/software/src/master/src/kernel/ -e node

Core was generated by `node /home/user/lnewman/opensource/fdsbridge/software/inprog/vision-xa-nodejs-c'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f2bdddd2648 in Vxa::VCollection::Invoke(Vxa::ICollection*, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vxa.so
Missing separate debuginfos, use: debuginfo-install glibc-2.17-307.el7.1.x86_64 libgcc-4.8.5-39.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 libuuid-2.23.2-63.el7.x86_64
(gdb) info thread
  Id   Target Id         Frame
  13   Thread 0x7f2be520f700 (LWP 11866) 0x00007f2be530eeb3 in epoll_wait () from /lib64/libc.so.6
  12   Thread 0x7f2bce4c3700 (LWP 11875) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  11   Thread 0x7f2bcecc4700 (LWP 11874) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  10   Thread 0x7f2bdf7fe700 (LWP 11869) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  9    Thread 0x7f2be642a780 (LWP 11865) 0x00007f2be530eeb3 in epoll_wait () from /lib64/libc.so.6
  8    Thread 0x7f2bcccc0700 (LWP 11901) 0x00007f2be55ebb3b in do_futex_wait.constprop.1 ()
   from /lib64/libpthread.so.0
  7    Thread 0x7f2bdcfea700 (LWP 11873) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  6    Thread 0x7f2be643c700 (LWP 11871) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  5    Thread 0x7f2bcdcc2700 (LWP 11876) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  4    Thread 0x7f2be4a0e700 (LWP 11867) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  3    Thread 0x7f2bdeffd700 (LWP 11870) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  2    Thread 0x7f2bdffff700 (LWP 11868) 0x00007f2be55e9a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
* 1    Thread 0x7f2bcd4c1700 (LWP 11877) 0x00007f2bdddd2648 in Vxa::VCollection::Invoke(Vxa::ICollection*, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vxa.so

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f2bcd4c1700 (LWP 11877))]
#0  0x00007f2bdddd2648 in Vxa::VCollection::Invoke(Vxa::ICollection*, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vxa.so
(gdb) bt
#0  0x00007f2bdddd2648 in Vxa::VCollection::Invoke(Vxa::ICollection*, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vxa.so
#1  0x00007f2bde5b070f in Vxa::ICollection_Role<Vxa::VCollection, Vxa::ICollection>::MemberImpl_Invoke(Vca::VMessage*, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/vxanode.node
#2  0x00007f2bddd83e45 in Vca::VInterfaceMember_4<Vxa::ICollection, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int>::operator()(Vxa::ICollection*, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int, Vca::VMessageManager*, Vca::VInterfaceMember_4<Vxa::ICollection, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int>::Message*) const ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vxa.so
#3  0x00007f2bddd84954 in Vca::VInterfaceMember_4<Vxa::ICollection, Vxa::ICaller*, V::VString const&, unsigned int, unsigned int>::Message::evaluate_() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vxa.so
#4  0x00007f2bcf9707e6 in Vca::VMessage::evaluate(VReference<Vca::VMessage>*&) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#5  0x00007f2bcf8faad0 in Vca::VcaOIDL::Evaluator::evaluate(Vca::VMessage*) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#6  0x00007f2bcf8fb170 in Vca::VcaOIDL::evaluateIncomingFrom(Vca::VMessage*, Vca::VcaSite*) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#7  0x00007f2bcf927708 in Vca::VcaSerializer::run_() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#8  0x00007f2bdd00e3a6 in V::VScheduler::schedule(V::VSchedulable*) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/V.so
#9  0x00007f2bcf937fe1 in Vca::IBSClient_Role<Vca::VcaTransport, Vca::IBSClient>::MemberImpl_OnTransfer(Vca::VMessage*, unsigned int) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#10 0x00007f2bcf8bd30a in Vca::VInterfaceMember_1<Vca::IBSClient, unsigned int>::operator()(Vca::IBSClient*, unsigned int, Vca::VMessageManager*, Vca::VInterfaceMember_1<Vca::IBSClient, unsigned int>::Message*) const ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#11 0x00007f2bcf8bb9e9 in Vca::VBS::onGetContinuation(unsigned long, unsigned long&) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#12 0x00007f2bcf8bba90 in Vca::VBS::onGetContinuation() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#13 0x00007f2bcf94eecf in Vca::OS::DeviceManager::processEvent_(unsigned long) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#14 0x00007f2bcf9569b5 in Vca::VDeviceManager::processEvents(unsigned long, bool&) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#15 0x00007f2bcf93d054 in Vca::VCohort::processEvents(Vca::VCohort::Manager*, unsigned long, bool&) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#16 0x00007f2bcf93d305 in Vca::VCohort::Manager::doEvents(unsigned long) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#17 0x00007f2bcf93d3e5 in Vca::VCohort::ProcessorRequest::process() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#18 0x00007f2bcf93eaba in V::VThreadedProcessor_<Vca::VCohort::ProcessorRequest>::employ(V::VThreadedProcessor_<Vca::VCohort::ProcessorRequest>::Worker*) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#19 0x00007f2bcf93ecd0 in V::VThreadedProcessor_<Vca::VCohort::ProcessorRequest>::Worker::run_() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#20 0x00007f2bdd00daef in V::VManagedThread::run() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/V.so
#21 0x00007f2bdd00dbb1 in V::VManagedThread::Run(V::VManagedThread*) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/V.so
#22 0x00007f2be55e5ea5 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f2be530e8dd in clone () from /lib64/libc.so.6
  
 
(gdb) thread 8
[Switching to thread 8 (Thread 0x7f2bcccc0700 (LWP 11901))]
#0  0x00007f2be55ebb3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f2be55ebb3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1  0x00007f2be55ebbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007f2be55ebc6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x00007f2bdd00ef0d in V::VSemaphore::consume() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/V.so
#4  0x00007f2bcf93e991 in V::VThreadedProcessor_<Vca::VCohort::ProcessorRequest>::employ(V::VThreadedProcessor_<Vca::VCohort::ProcessorRequest>::Worker*) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#5  0x00007f2bcf93ecd0 in V::VThreadedProcessor_<Vca::VCohort::ProcessorRequest>::Worker::run_() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/Vca.so
#6  0x00007f2bdd00daef in V::VManagedThread::run() ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/V.so
#7  0x00007f2bdd00dbb1 in V::VManagedThread::Run(V::VManagedThread*) ()
   from /home/user/lnewman/opensource/fdsbridge/software/20200717/vision-xa-nodejs-connect/build/Release/V.so
#8  0x00007f2be55e5ea5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f2be530e8dd in clone () from /lib64/libc.so.6
 

From Charlie's email:

    Another thing Charlie will let percolate is the to-do list from  what he believe is Mike's most recent check-in and raises issues that are likely related to these core dumps. Leon, maybe these comments will resonate with you.
     
    Second 'await' work-in-progress checkpoint 
    Includes a 'working' implementation of C++ object wrappers using
    object and function templates.
     
    To Do:
    1) The machinery of exposing C++ objects to JS involves wrapping 'void*'
       pointers inside 'v8::External's.  Because there is no easy and perhaps
       absolutely no way to know when the enclosing 'v8::External' has been
       reclaimed, managing the lifetime of C++ referenced by the wrapped
       pointer is a COLOSSAL pain.  Currently we use 'v8::External' wrappers
       to reference:
     
         1) object export entries in the JavaScript object export table
         2) task launch callbacks
         3) wrapped C++ objects exposed to JavaScript code (the case
            described above).
     
       Of these cases, (1) is safe because export table entry lifetimes are
       managed exclusively by C++ code, (2) is probably safe - subject to
       audit - given the existence of external references to the task being
       launched, while (3) remains a problem.
     
    2) An implementation of task suspend/resume.  Two candidates implementation
       models exist:
     
         1) hijacking the existing machinery used to suspend and resume
            the running 'batchvision' SNF task
         2) adding new, explicit suspend/resume control interfaces to
            'batchvision'
     
       In all probability, (2) will be chosen even though it requires the
       release of a new version of 'batchvision'.
LeslieNewman commented 4 years ago

Figured out to reopen the accidentally close issue.