sassoftware / saspy

A Python interface module to the SAS System. It works with Linux, Windows, and Mainframe SAS as well as with SAS in Viya.
https://sassoftware.github.io/saspy
Other
375 stars 149 forks source link

Reconnecting to Session #99

Closed chrishales709 closed 6 years ago

chrishales709 commented 6 years ago

Is it possible to re-connect to a SASPy session after temporarily losing an internet connection? For example, sometimes I create a session, run some code, and then shut my laptop to go to a meeting. After the meeting, I go back to run some more code in the session, but I get the following message:

"No SAS process attached. SAS process has terminated unexpectedly."

If not, is there a way to create and connect to a SASPy session on a remote machine?

tomweber-sas commented 6 years ago

Well, there's no smarts in saspy currently to do that. Are you using the IOM access method? I assume so. Not STDIO over SSH. I'll have to investigate if that can be done at all. If it can, if IOM supports such a thing, then I can see how to add that in to saspy. But I don't know if that's possible or not off the top of my head. I'll look into it tomorrow!

Thanks, Tom

chrishales709 commented 6 years ago

Yes, I'm currently using the IOM access method. Thanks for looking into this. It's one of my biggest pain points.

tomweber-sas commented 6 years ago

@chrishales709 Good news. I just heard back from one of our IOM folks that this should be possible.It will take me some R&D, but I will work on this as the next feature for saspy. Probably next week though, as I'm head down on another project right now. I'll post back as I get a chance to see how it works. This will be a good feature to have.

Thanks, Tom

tomweber-sas commented 6 years ago

Just to be sure of your scenario, you are connected to a remote IOM server from your laptop (where saspy is running), right? IOM also supports a local connection where you just have a regular SAS install on your laptop. That case has no metadata server, no object spawner, no workspace servrr. It just simulates all of that for IOM's sake and runs your local SAS as if it were an IOM Workspace server.

The reconnect scenario is for the remote case with all of the infrastructure (metadata obj spawner, real workspace server...). The local case, wouldn't need it nor does it support it, from what I've gathered.

Just making sure I'm really going to be addressing your actual problem.

Tom

chrishales709 commented 6 years ago

Correct, the reconnect scenario is just for the case with SASPy on my laptop connecting to a remote IOM server. I actually don't have any local SAS installation.

FriedEgg commented 6 years ago

@tomweber-sas, I do not see how this would be possible for a Workspace Server. I am not familiar with any options for extending the timeout for this server type. Workspace Servers do have keepalive, but that would not benefit a timeout/disconnect-reconnect. Enterprise Guide, for example, does not have this ability. Will be interested to see if there is another option I am not familiar with.

tomweber-sas commented 6 years ago

@FriedEgg , Well, I'm just about to start looking into this today. I have information from one of the developers that says this is supported. But, until I see it work, I'll reserve judgement. I'll let you know what I find. If it can be done, I'll get it working!

chrishales709 commented 6 years ago

Sometimes I use SAS Studio, and when I disconnect my laptop from the network and then re-connect it, I get a message saying that SAS Studio was able to re-connect to the network. So it seems like SAS Studio has some way to do this.

I've also noticed that SAS Studio can get the results of a Background Submit as long as you don't close the browser. This works even if you disconnect from the network and then re-connect.

FriedEgg commented 6 years ago

@chrishales709, SAS Studio is a very different setup. In this case, the Studio Application maintains the Workspace connection while managing the connect/disconnect/timeout/reconnect between itself and the client you are using. Should the Studio Application itself loose Workspace connectivity, it could not recover. SASPy and Enterprise Guide behave similarly, as they are managing the connection directly from the clients, instead of connecting through a mid-tier. SASPy could follow a similar paradigm. Where the Java piece of SASPy, which manages the IOM connection is running on the SAS server (remote or otherwise) and then talking to Python, as it does not, through sockets, from the SAS server to the SASPy client and handling session timeout/reconnect from the SASPy server to the SASPy client...

tomweber-sas commented 6 years ago

@chrishales709 I've been able to implement IOM's reconnect functionality in my saspy java code, for a simple case - to prototype it with. There's still more cases to handle in there, and that brings me to the next set of issues/questions.

What exactly happens when you 'close the lid' on your laptop? Sleep, Hibernate? Is the network adapter disabled? All kinds of possibilities here, and I believe there's going to be more than just this part that I've got so far with IOM.

There's a few things going on that all have to be accounted for. 1) saspy python process 2) saspy java process 3) sockets between these two processes (loopback adapter local to the laptop) 4) IOM client in the java process 5) network connection between java client and workspace server (remote network connection)

So, this part addresses a disruption of number 5 above. it's implemented in number 4 above. Now, I need to know what's going on with the first 3 for your particular case. Are both python/java processes (1,2) still up and running? Is the socket between the two (3) still functioning? Given you got the error: "No SAS process attached. SAS process has terminated unexpectedly." that means to me that the java process (2) is gone as well as the socket (3) between it and python (1). This would happen if only 5 goes away (2 will then go away and then 3). It would also happen if something happend to 3 as well.

This is where I need to understand what is happening on your machine when you close the lid. If only the remote network connection (5) is lost and 1,2,3 are still able to function (Sleep mode case; I believe) then I think we're close. I still have work to do to get this fully implemented, but this part is working.

If the network adapter is being dissabled, and the loopback (internal) connection is lost too, then that will take out 3, which will cause 2 to terminate and I'll have to invent a new protocol for reconnecting saspy's python and java processes. That would involve catching the socket failures in both processes and having some kind of reconnection scheme. That's certainly a lot more work, but I imagine I can come up with something if I have to.

So, can you identify what's really happening when you close your lid? The code below can be used to test if the loopback adapter is disconnecting, without having to try to figure it out on your own. If it is and you can configure your network adapter to stay alive even with the lid closed (if it's not currently), that would make things much better too.

Here's a standalone pair of programs you can run, then close the lid (do whatever it is) and then see if they are still running after you get back up and running again. I just run these from a command prompt (two different ones), though you could run them in jupyter too (two different notebooks).

Submit this in one python process first. It will print out the port the second one needs.

import socket as socks
import time
from time import sleep

sockin  = socks.socket()
sockin.bind(("",0))
port = sockin.getsockname()[1]
sockin.listen(1)
print("port to connect to is "+str(port))

sock = sockin.accept()

while True:
  ti  = time.localtime()
  tis = str(ti.tm_hour)+':'+str(ti.tm_min)+':'+str(ti.tm_sec)
  sock[0].send(tis.encode())
  print(sock[0].recv(256).decode())
  sleep(5)

In the other process, submit this (change the port number for the prompt, or enter it after being prompted, or just hard code it for the port on the sock.connect and remove the input() line and number)

import socket as socks
port = int(input("enter the port nuimber"))
63287

sock = socks.socket()
sock.connect(("127.0.0.1",port))

while True:
  inout = sock.recv(256)
  sock.send(inout)

This will emulate what's happening between 1 and 2 above (as 3) so we can tell if you're loosing 3 or if it's just 5 that's going away which is then causing 2 to go away (and 3 of course). If these two processes are still interacting after your back from the lid close, then we should be in good shape to address only losing 5.

Hope this all makes sense. When I make my desktop Sleep, these are still functioning when I wake it back up. I also don't lose my remote connection when I sleep, but maybe I would if I waited longer???

Need to know how yous is working though, Tom

Here's what those two should look like, for reference:

C:\Users\sastpw>python
Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket as socks
>>> import time
>>> from time import sleep
>>>
>>> sockin  = socks.socket()
>>> sockin.bind(("",0))
>>> port = sockin.getsockname()[1]
>>> sockin.listen(1)
>>> print("port to connect to is "+str(port))
port to connect to is 63287
>>>
>>> sock = sockin.accept()
>>>
>>> while True:
...   ti  = time.localtime()
...   tis = str(ti.tm_hour)+':'+str(ti.tm_min)+':'+str(ti.tm_sec)
...   sock[0].send(tis.encode())
...   print(sock[0].recv(256).decode())
...   sleep(5)
...
8
15:21:58
7
15:22:3
7
15:22:8
8
15:22:13
8
15:22:18
8
C:\Users\sastpw>python
Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>> import socket as socks
>>> import time
>>> from time import sleep
>>> port = int(input("enter the port nuimber"))
enter the port nuimber63287
>>>
>>> sock = socks.socket()
>>> sock.connect(("127.0.0.1",port))
>>>
>>> while True:
...   inout = sock.recv(256)
...   sock.send(inout)
...
8
7
7
8
8
8
8
8
8
8
8
8
8
chrishales709 commented 6 years ago

@tomweber-sas

When I ran your two programs, the first program continued to print out times even after I disconnected. The second program was still running too, but it wasn't printing out integers like what your screenshot shows. It didn't do that before or after I disconnected though. So it seems like everything continues to run even after disconnecting.

If it helps, I'm disconnecting from a docking station that has a wired connection, and then working on a laptop that connects to wifi.

tomweber-sas commented 6 years ago

Ah, that's great news actually. Knowing you're just switching from LAN to WiFi explains a lot. And the programs still running proves the loopback address isn't cutting out. So that should mean that what I have prototyped ought to work for your situation. Now I need to finish up handling this all the way and test it all out. I've had to restructure the java piece completely, so I'll need to make sure I haven't messed up anything just doing that and validate the various disconnect paths to catch too. Looks promising :)

While I do that, you'll need to make sure that this feature is enabled on your workspace server. It's under the advanced options on the Workspace Sever Properties named Client Reconnection (you can see that with SASMC connected to your metadata sever): image

When I get this to a state you can try it, I'll push it to a new branch so we can try it out before committing it to master.

Thanks! Tom

tomweber-sas commented 6 years ago

@chrishales709 I've created a new branch called reconnect. Can you try that out? I've not been able to actually try out your case, network swap, but I've been testing by closing my iom connection in the code (in the debugger) and it's reconnecting as should. I'm under the impression this should simulate a network drop, but I can't prove that as of yet. Did you verify your IOM server is configured for reconnect? If you have any questions about getting the code from the branch, let me know. Also, as the saspyiom.jar has changed, be sure that your classpath is pointing to this new one.

@FriedEgg I know you're curious about the reconnect. The code is in the saspy2j.java file. I've restructured this completely, It ought to be fairly straight forward to see how it works. If you have any questions, let me know.

Thanks! Tom

FriedEgg commented 6 years ago

Thanks @tomweber-sas, I have a pretty good idea once I saw you picture from SASMC. Thought that setting was only applicable to Pooled Workspace Servers. Guess my memory plays tricks on me. Looking forward to reviewing the update!

chrishales709 commented 6 years ago

@tomweber-sas

I tested this, and I got the following message:

'The application could not log on to the server "xxx". The server configuration is invalid.'

I don't have access to confirm if our IOM server has "Allow clients to reconnect" checked or not.

tomweber-sas commented 6 years ago

Ok, let me see if there's a way to tell. I know you can get some info from proc iomoperate, but I don't know if it gets to those details. I'll see if I can see. Is there a SAS Admin you can check with?

Now, that message does seem to be the catch all if the reconnect fails. So, it is possible you do have it configured to allow reconnecting but still get the error. I don't have multiple reconnect attempts coded currently, so if, for instance, you tried to submit something before having the new network connection established this would fail, probably with this message. I need to see if I can play with this with actual network disconnects so I can be sure it'll behave as expected. I can't truly simulate that with my desktop machine.

I'll check on how to tell if the workspace is configured and see if I can requisition a laptop so I can try this with real network drops.

Tom

tomweber-sas commented 6 years ago

ok, I've gotten a loaner laptop, installed everything I need on it and tried this out, switching from LAN to WiFi. It failed for me too. So, now to try to track this down and figure out what's going on. This may take a little while. Tom

FriedEgg commented 6 years ago

You can query this information using PROC METADATA

options metaserver="hostname" metaport=8561;

filename rslt temp;

proc metadata
in=
'
<GetMetadataObjects>
    <Reposid>$METAREPOSITORY</Reposid>
    <Type>ServerComponent</Type>
    <Objects/>
    <NS>SAS</NS>
    <!-- OMI_TEMPLATE (4) + OMI_XMLSELECT (128) + OMI_GET_METADATA (256) -->
    <Flags>388</Flags>
    <Options>
        <XMLSELECT search="ServerComponent[@ClassIdentifier=''440196D4-90F0-11D0-9F41-00A024BB830C'']"/>
        <Templates>
        <ServerComponent Name="">
            <Properties search="Property[@PropertyName ? ''reconnect'']">
                <Property/>
            </Properties>
        </ServerComponent>
        <Property PropertyName="" DefaultValue=""/>
        </Templates>
    </Options>
</GetMetadataObjects>
'
out=rslt
;

run;

filename map temp;

data _null_;
file map;
input @;
put _infile_;
cards;
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="AUTO_GEN" version="2.1">
    <NAMESPACES count="0"/>
    <TABLE description="Property" name="Property">
        <TABLE-PATH syntax="XPath">/GetMetadataObjects/Objects/ServerComponent/Properties/Property</TABLE-PATH>
        <COLUMN name="Property_Id">
            <PATH syntax="XPath">/GetMetadataObjects/Objects/ServerComponent/Properties/Property/@Id</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>17</LENGTH>
        </COLUMN>
        <COLUMN name="Property_PropertyName">
            <PATH syntax="XPath">/GetMetadataObjects/Objects/ServerComponent/Properties/Property/@PropertyName</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>19</LENGTH>
        </COLUMN>
        <COLUMN name="Property_DefaultValue">
            <PATH syntax="XPath">/GetMetadataObjects/Objects/ServerComponent/Properties/Property/@DefaultValue</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>5</LENGTH>
        </COLUMN>
    </TABLE>
</SXLEMAP>
;
run;

libname rslt xmlv2 xmlmap=map access=readonly;

proc print data=rslt.Property noobs; run;
Property_Id Property_PropertyName Property_DefaultValue
A5AV2F12.AC0001HP AllowReconnect False
A5AV2F12.AC00024T MaxReconnectTimeout 60
tomweber-sas commented 6 years ago

@FriedEgg That's awesome! I was playing around with proc metadata, but didn't get anywhere near this. I was wondering if you had something up your sleeve :) Meanwhile, I got debugging set up on the laptop and verified what we were seeing. It's working right, but it's failing to reconnect. I'm working with the IOM developers to track it down. We're seeing the failures in the object spawner logs, but the logging isn't enough to get all the details. We're going to see about dialing up the logging to see what the problem is. More to come ... Tom

tomweber-sas commented 6 years ago

Well, it appears this case just doesn't work. When the client side network fails, the workspace server sides connection doesn't get reset. The socket stays ESTABLISHED and the workspace server doesn't know there's no connection anymore. That's different than if the socket were to get disconnected on the client side, in which case the workspace server socket would get an error. The IOM guys tried setting the workspace server option keepalive=30 (seconds), to see if that would cause it to get a socket error, but that didn't have any effect.

So, what I can do is provide a method off the SASsession object to disconnect, explicitly. If you submit that before losing your network connection, then when you have a new network connection and you resume your work, the reconnect will work. That's how I was testing this out when implementing it, as I couldn't disconnect my network.

Obviously, this isn't the ideal case, but it will give you the ability to switch from the LAN to WiFi and reconnect to your existing workspace server and resume where you left off. You just have to do the disconnect before losing the network you have. I will have to rework my code of course for this, but I don't see an issue with doing that.

What do you think? Tom

tomweber-sas commented 6 years ago

I've made the changes to support manually disconnecting from IOM. It's simply a disconnect() method off the SASsession object:

sas = saspy.SASsession(cfgname='someIOMcfg')
do some work
sas.disconnect() #swap networks - be sure you're connected to a network before submitting anything next 
continue working

I've pushed this to another branch called disconnect. @chrishales709 can you try this out and see if it works for you as expected?

Thanks! Tom

tomweber-sas commented 6 years ago

Oh, BTW, you can use this method to test if your workspace server is configured for reconnect. If it is, it disconnects (and the next thing you submit reconnects). If it's not it doesn't and tells you that (and you're still connected). Tom

tomweber-sas commented 6 years ago

FYI: I have validated that this does work when switching networks; on my loaner laptop from LAN to WiFi and back.

10.220.52.99 is WiFi IP addr
tcp        0      0 ::ffff:10.21.11.21:8821     ::ffff:10.220.52.99:63672   ESTABLISHED 3239/sas
disconnect/reconnect
tcp        0      0 ::ffff:10.21.11.21:8801     ::ffff:10.23.12.120:52037   ESTABLISHED 3239/sas
10.23.12.120 is the LAN ip

pid 3239 is my workspace server.

I think this is good. Sorry it's a manual step, but it does allow you to switch networks on the fly and keep working. image

chrishales709 commented 6 years ago

@tomweber-sas Thanks! I confirmed that this works on my end as well! This will make working with SASPy much easier. Sorry I wasn't able to test this sooner. I was not in the office the end of last week. Is there a timeout for this? For example, could I disconnect from my network, commute home for 15 minutes, and then reconnect at home? I can certainly test this later, but I wanted to see if you know what the max time would be.

tomweber-sas commented 6 years ago

Hey @chrishales709 that's great. No problem with testing, I had plenty else keeping me busy :) I'll go ahead and get this (with 2 other issues) merged into master. I've updated the doc with this info too, so others can take advantage of it. There is a timeout, and in the code I have it set to 60 min. I can make that an option in the config definition if you would like. The workspace server itself has a timeout too, which seems to default to 60 min, though it can be set to whatever.

FriedEgg commented 6 years ago

A fairly disappointing resolution, I feel. Maybe instead of manually having to call disconnect SASPy can automatically do it after every code submission if the Workspace setting is enabled and then do it's own keep-alive based on the timeout supposing the network connection remains active?

tomweber-sas commented 6 years ago

yeah, I was disappointed by that too. Especially when I was told that this was the case they added the feature for in the first place: on one Wifi, then go somewhere else, and get on another. Also, in asking about the timeout setting, and specifically the timeout value I have to provide when I call to get a token to use to be able to reconnect, I was told that if I provide a value greater then the one defined in metadata for the server itself, I'll get an error. So, I'm going to have to make some change to address that. I don't know what's defined in OMR and would have expected I could call and just have it use that value. I wouldn't set it less! waiting to hear back on that part, Then to test it and see what I see happen.

Ok, I was told I can specify 0 and it will use the timeout configured for the server. That's the only reasonable thing I would want to do.

So, @chrishales709 , the time you have to reconnect is the time configured in OMR for reconnect (Maximum reconnect timeout) in the picture back up above; after I test that out and validate it, of course :)

I'm hoping to get this tweaked and vetted and we should be good to merge this in and I'll build a new Pypi package for 2.2.2. With any luck, tomorrow.

More to come... Tom

tomweber-sas commented 6 years ago

Ok, I've verified both cases of the timeout value specified by my client code; being bigger than the server setting fails and 0 works. I plan to merging this and creating a new release tomorrow after I double check everything.

Thanks! Tom

tomweber-sas commented 6 years ago

this has been merged to master. Will build new release next. Tom