Closed jrisi256 closed 4 years ago
Thank you very much for working on this. My problem is that I have just started a new job at a new institution (Baylor College of Medicine) and have no access to their computational environment (which hopefully has a SAS server) yet. I will get back to you as soon as I can.
@zhudakai : Is there a SAS server at Baylor?
No worries! Thank you for the timely response. Would there be someone who might be able to help on Gitter? If not, I'll sit tight :)
Just make sure that you have read https://vatlab.github.io/sos-docs/running.html#sas carefully. In particular, sos-sas
requires SAS 9.4 or higher.
Ok. I looked into all the specifics around our session and environment. Posting them here for your convenience and whenever you have a chance to look at this.
Operating System: Red Hat 8 RStudio Server Pro - 1.2.5019 Jupyter Notebook - 6.0.3 Jupyter Lab - 1.2.6 Jupyter Core - 4.6.1
Python - 3.7.6 (Focal version where SoS, Jupyter, saspy etc. are installed) saskernel - 2.2.0 saspy - 3.2.0 ipykernel - 5.1.4 SoS - 0.21.5 SoS notebook - 0.21.7 SoS Papermill - 0.1.6 SoS R - 0.19.2 SoS SAS - 0.18.0
Python - 3.6.5 (also set-up as its own kernel) ipykernel - 4.8.2
R 3.6.3 R 3.6.2 R 3.6.1 R 3.6.0 irkernel - 1.1 (installed for each version of R, each has their own kernel set up)
All the above software is on a separate machine from our SAS install. I have configured sascfg.py in saspy correctly so the SAS kernel is working within Jupyter. Below is the information concerning SAS and the OS of that machine.
SAS - 9.4 - Maintenance 5 Operating System - Red Hat 7
Sorry guys, I've been preoccupied with my company emails. Yes we are running SAS server of some sort. We support SAS Studio as well as SAS Enterprise Guide. Both requires some back end metaserver. We even activated LDAP for authentication. Looking forward to helping in any way On Tuesday, April 7, 2020, 01:21:11 PM CDT, jrisi256 notifications@github.com wrote:
Ok. I looked into all the specifics around our session and environment. Posting them here for your convenience and whenever you have a chance to look at this.
Operating System: Red Hat 8 RStudio Server Pro - 1.2.5019 Jupyter Notebook - 6.0.3 Jupyter Lab - 1.2.6 Jupyter Core - 4.6.1
Python - 3.7.6 (Focal version where SoS, Jupyter, saspy etc. are installed) saskernel - 2.2.0 saspy - 3.2.0 ipykernel - 5.1.4 SoS - 0.21.5 SoS notebook - 0.21.7 SoS Papermill - 0.1.6 SoS R - 0.19.2 SoS SAS - 0.18.0
Python - 3.6.5 (also set-up as its own kernel) ipykernel - 4.8.2
R 3.6.3 R 3.6.2 R 3.6.1 R 3.6.0 irkernel - 1.1 (installed for each version of R, each has their own kernel set up)
All the above software is on a separate machine from our SAS install. I have configured sascfg.py in saspy correctly so the SAS kernel is working within Jupyter. Below is the information concerning SAS and the OS of that machine.
SAS - 9.4 - Maintenance 5 Operating System - Red Hat 7
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Reading more closely on your ticket now while the SAS server is being configured.
No idea on the put data to SAS part, but to get data out, we are doing something very simple here, namely we create a temporary directory, and run the following in SAS
libname TEMP 'path_to_temp';
Data TEMP.dat_1;
set CLASS;
run;
and we assume that a file named data_1.sas7bdat
would be created under path_to_temp
. With your version of SAS, what file would be created with this statement?
@jrisi256 There is also a possibility that the Jupyter server and SAS server do not share the same /tmp
directory... If that is the case, we have to find a common directory so that the Jupyter server can read the /tmp/... sas7bdat
file written by the SAS server.
dat_1.sas7bdat We're running SAS 9.4 (TS1M6)
Since SAS 9.4 is using the same file format, so @jrisi256 must have the SAS server and Jupyter server on two file systems. Let me check if there is a good solution to that... In many cases these servers share the same $HOME
etc, but not /tmp
.
That is correct! The SAS server and the Jupyter server are on two different machines. Currently they don't share any directories or anything like that. We could set it up so that there was mounted directory that both of them could access. How would we configure SoS then to use this new shared directory (rather than /tmp)?
As far as I remember, saspy
did not support this (sas and jupyter on different filesystems) when we developed sos-sas
. The SoS -> SAS
problem was caused by the use of newer versions of saspy
, and the SAS -> SoS
problem was caused by the separation of file systems. I have fixed the former and is working on the latter... It should be ready by the end of this week.
Are you guys talking about something like NFS mount or Samba share? If a file system is a shared resource, any host can mount a slice, just make sure SAS and Jupyter are mounting the same slice - sorry if I'm off target here
Yes and no. Yes, we could use some directory that is shared between two services, but in general if saspy allows across file system communication, sos-sas
should do so as well. There are some technical problems, in particular I am not sure if the jupyter ioPub channel can be used to pass large amount of data, but I should be able to come up with something that @jrisi256 can test.
Bo, I’m really interested in all of this. A novice question: can OpenMPI be used as an API? I know building a system around file systems is silly. Thanks. Dakai
Sent from my iPhone
On Apr 15, 2020, at 17:46, Bo notifications@github.com wrote:
Yes and no. Yes, we could use some directory that is shared between two services, but in general if saspy allows across file system communication, sos-sas should do so as well. There are some technical problems, in particular I am not sure if the jupyter ioPub channel can be used to pass large amount of data, but I should be able to come up with something that @jrisi256 can test.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Short answer is no.
To ensure maximum compatibility, sos-notebook
takes a wrapper
approach to existing kernels. It can "inject" commands to kernels and "tap" output from kernels, but does not change subkernels in anyway, or know anything about the kernel.
In case of sos-sas
, SoS Notebook currently sends a statement to the SAS kernel to save the dataset to a file, "pick up" the output from the file system, and load the dataframe to SoS, effectively "transferring" a dataframe from SAS to SoS. However, when SoS and SAS do not share file system, SoS failed to pick up and the bug happened.
There are two solution here. I really would like to use SAS kernel to dump the data out but to do that I will have to let SAS "print" out the dataset so that SoS can get the output and create a dataframe. The problem right now is that the SAS Kernel prints everything in HTML. This means if I want to output a large dataset, all output will be wrapped in things like <span>123</span>
, which is a HUGE waste of bandwidth and I do not know if it will work for reasonably large dataset.
Another option is to grab the dumped files directly from the SAS server, which could work if the connection is through SSH
, but will not work if the connection is through a JAVA library (the IOM method).
I am debating which method I should use. Perhaps I can implement second method for Linux-based servers, and then add the first method later for windows-based servers. It would be slower but windows users can already tolerate windoze....
sas can output in clear text format such as a .lst file, but it's still formatted. it has a lot less tags, easier to be cleaned up. it can also dump into PDF format - suppose you have an API? if you insist, sas has a driver to write out .csv. there are a few other format really - not all are file system dependent. i personally would use a put/file combo so I can get a data file for next stop - what if you write to stdout?
in this blog: https://communities.sas.com/t5/SAS-Programming/How-to-redirect-put-statements-to-the-result-instead-of-the-log/td-p/513618
sas -stdio <(echo 'data null; file stdout; put "hello, foobar"; run;') 2>/dev/null
No, writing file is not the problem here. The python end can read sas7bdat
, csv
or whatever SAS saves, but it need to access the file that in this case on the remote host.
After playing around a bit, I found that if I run
%put CLASS2
DATA CLASS2;
INPUT NAME $ 1-8 SEX $ 10 AGE 12-13 HEIGHT 15-16 WEIGHT 18-22;
CARDS;
JOHN. M 12 59 99.5
PETER F 15 59 9.5
PROC PRINT;
RUN;
in SAS, a file called class.sas7bdat
would appear in a directory that is under the work directory on the SAS server.
%put %sysfunc(getoption(work));
I can retrieve the class.sas7bdat
file with scp
now, but could you let me know if this is always the case? That is to say, for whatever dataset in SAS, they will be under this work dir?
I think an obvious exception is for user to do
libname NAME 'path'
but how SAS creates dataset in NAME
? Sorry, I have not used SAS in the past 15 years and forgot all about it.
With the latest patch, remote SAS server works if it is linux-based that is connected through ssh
.
But more tests are certainly needed.
@jrisi256 You are welcome to build sos-sas
from GitHub and test if it works in your environment, but only if you are comfortable with building python packages from source and testing unstable source code. I will test the module in the next few days and will release the next version of sos-sas
afterwards.
libname defines where a permanent dataset should be stored, otherwise, all temporary datasets should be destroyed when SAS ends, but if one kills SAS forcibly then there would be no chance for garbage collection to occur - this is a sysadmin’s headache. Anyway, the temporary directory or working directory can be defined post installation, in a file sasv9.conf. This can be customized with sasv9_local.conf. A user can also define this in his home directory to supersede my system settings. I frequently would change default from /tmp to /scratch/SAS, given /scratch always seems to be much bigger. Other things I change is the memsize and swapsize, again, I have to have these especially large
Sent from my iPhone
On Apr 15, 2020, at 22:23, Bo notifications@github.com wrote:
No, writing file is not the problem here. The python end can read sas7bdat, csv or whatever SAS saves, but it need to access the file that in this case on the remote host.
After playing around a bit, I found that if I run
%put CLASS2 DATA CLASS2; INPUT NAME $ 1-8 SEX $ 10 AGE 12-13 HEIGHT 15-16 WEIGHT 18-22; CARDS; JOHN. M 12 59 99.5 PETER F 15 59 9.5 PROC PRINT; RUN; in SAS, a file called class.sas7bdat would appear in a directory that is under the work directory on the SAS server.
%put %sysfunc(getoption(work)); I can retrieve the class.sas7bdat file with scp now, but could you let me know if this is always the case? That is to say, for whatever dataset in SAS, they will be under this work dir?
I think an obvious exception is for user to do
libname NAME 'path' but how SAS creates dataset in NAME? Sorry, I have not used SAS in the past 15 years and forgot all about it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Regarding SAS working directory, I wonder if there’s an environment variable you can tap into? We can take a look in our RedHat 7 host
Sent from my iPhone
On Apr 15, 2020, at 22:45, Bo notifications@github.com wrote:
With the latest patch, remote SAS server works if it is linux-based that is connected through ssh.
But more tests are certainly needed.
@jrisi256 You are welcome to build sos-sas from GitHub and test if it works in your environment, but only if you are comfortable with building python packages from source and testing unstable source code. I will test the module in the next few days and will release the next version of sos-sas afterwards.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Once in SAS, variable WORK is the location of temporary files of all kinds. Hence
data one;
Is the same as
data WORK.one;
WORK must be a reserved libname then
Sent from my iPhone
On Apr 15, 2020, at 22:45, Bo notifications@github.com wrote:
With the latest patch, remote SAS server works if it is linux-based that is connected through ssh.
But more tests are certainly needed.
@jrisi256 You are welcome to build sos-sas from GitHub and test if it works in your environment, but only if you are comfortable with building python packages from source and testing unstable source code. I will test the module in the next few days and will release the next version of sos-sas afterwards.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
So if I understand you correctly,
data WHATEVER
is the same as
data WORK.WHATEVER
which is under a directory that can be retrieved (however the system is set up) with the following command.
%put %sysfunc(getoption(work));
This is the case that I have already handled.
Now, is there any command to get the path to mylib
Data mylib.data
where mylib
is defined somewhere with
libname mylib path
What I want to do is, when user tries to do
%get mylib.data --from SAS
from a SoS kernel, sos-sas
should try to get the path of mylib
, and then scp
the path_to_mylib.data.sas7bdat
to the local file system to read into SoS.
Define and use libname will result permanent datasets to be read/written to a custom location - key word is 'permanent'. WORK on the other hand should be cleared up if SAS can exist normally. In our case, if you go to /scratch/sastemp/, you'd see SAS garbage people dumped when their jobs were terminated abnormally For your purpose, tapping into -WORK environmental variable inside SAS is the best bet. If a libname is defined and you can tap into it then that should cover more datasets. Do you give people an option here?
Yes and no again because I do not know SAS well.
For example, if you have a dataset under a customized libname,
libname mylib "my path"
Data mylib.DATA ...
Can DATA
be accessed without mylib
?
Basically I am trying to decided if SoS should be more clever and do
%get DATA --from SAS
or users have to prefix with libname
%get mylib.DATA --from SAS
Note that in case of WORK
, we ignored WORK
and used dataset name directly.
I never used Sos, SoS looks so flexible in sharing objects among platforms, I agree with Dakai that it is necessary to assign libname. My basic idea is that the object (sas, R, python) created should be tracked with its full address, maybe work (SAS) scenario is very hard to track, since it depends on the initial assignment when sas was installed.
sample codes above seem to work on the current working space, another question is for module sos-sas engine, if it does not depend on current working environment, how does it get input feed and generate sas dataset?
@finkbine Let us do this one by one. If you create a dataset under another directory with libname mylib "path"
, can it be assessed without libname?
in sas, a prefix (libname) must be used to reference a dataset, basically, libname my lib "path" means path already exists, and if you want to use datasets within path, you have to write path.***
and the only exception is "work"...
Second question, say you have create a mylib
, what is the command for me to get the directory of mylib
?
in sas, as far as I know, libname was created by user, so user should know what he is doing. in a sas session, any number of instances (libnames) could be initiated
in SoS, is there any mechanism to track what users types in sas code, for example, searching keyword libname
No. I am using
%put %sysfunc(getoption(work));
to get the directory of WORK
. I actually saw something online that says
%put %sysfunc(pathname(mylib));
can be used to get path of mylib
. Could you check if this is true?
I see, I never used this option since it is rare to me. Yes, this command can give absolute path for libname, and retrieve datasets.
Great. Then could you please adapt the "CLASS" example to an example in a customized libname so that I can test it? The workflow would be something like
libname mylib path
create CLASS inside mylib
Then from SoS
%get mylib.CLASS --from SAS
should grab mylib.CLASS
and create a dataframe. I have not decided what name the dataframe should be called. It could be just CLASS
, or a namespace with CLASS
inside so the dataframe would be called mylib.CLASS
.
did you implement these engines with default python, R , sas functions? sometimes, format problem could be a concern.
Maybe Dakai can do it? Currently, I do not know how to use it.
Thanks @finkbine . That is ok, I think I can figure out.
Hi @BoPeng thanks so much for all the hard work you're doing in trying to resolve this issue. I am an admin for a group of users so as a policy I don't download/install unstable development releases. Since this isn't an urgent issue for us, I am going to hold off until you officially release the fix.
In the mean time, if you need me to test something or if there is anything else I can to help, let me know. Sadly I don't know SAS at all either, haha. I just help set this up for people to use.
@finkbine I have just released sos-sas version 0.19.0. It has been tested in our SAS environment (jupyterlab/mac + SAS/Linux) and should hopefully also work in yours. Please check the updated sos-sas
documentation for details and let me know if you encounter any more bug.
Note that remote %get
from windows SAS servers will not work, but I will leave it for later.
Hi,
Thanks for making such a great tool!
I manage an RStudio Server Pro installation for a variety of users on RedHat 8. We have a variety of data scientists who work in R, Python, and SAS. Some people have expressed interest in using this tool in their workflows so I'm working to get it all setup for them.
I have R versions 3.6.0 to 3.6.3 and Python versions 3.7.6 and 3.6.9. Additionally, I have SAS set-up.
On the SoS side, I have installed the following package using pip:
I also have installed irkernel and feather for all versions of R AND saspy and sas_kernel for SAS. Each of the languages work fine on their own independently. R and Python are playing nice with each other. However, I cannot pass data to SAS (from any version of R or Python) nor can I get data from SAS (from any version of R or Python).
Below is what happens when I to pass data to SAS:
Below is what happens when I try to retrieve data from SAS:
I tried Googling around, but I couldn't really find anything helpful. Our set-up is pretty specific so if you need more information from me let me know.