Closed chrishales709 closed 5 years ago
I'm on vacation this week. I'll look into this when I get back after the holiday. Thanks! Tom
@chrishales709 I believe both of these are doable. Just looking at the functions to get file information and, of course, they vary per OS. As a first thought, I'm thinking adding another method to get the various pieces of info, given a file path. See the different item you can get per OS here: http://support.sas.com/documentation/cdl//en/lefunctionsref/69762/HTML/default/viewer.htm#p0cpuq4ew0dxipn1vtravlludjm7.htm I think returning a dictionary from this method with the items/values. I think keeping this more atomic and using python to drive it makes more sense than trying to pile it all into one method. So, you can use the dirlist(), and then, via python code, iterate (of pick the file you're only care about), and call the fileinfo() method to get a dict with whatever attributes you get for that file. If you know the file and want info, just call that; no need to get dirlist and info all piled together.
As for uploading a file, that can be done. Of course you need authority to create files wherever that SAS server is running, no magic here. But it shouldn't be hard to do. I will need to think through various use cases for this though to be sure this is useful for multiple cases. Binary transfer? Character w/ or w/out transcoding? ... And then a download too?
What are your thoughts? Tom
@tomweber-sas , That all sounds like a good plan. Adding in a download method would also be great. I don't have anything to add on the transfer method besides it might be good to have options of how to transfer files (binary vs text). Thanks, Chris
In my use case, users can download the file, makes changes on different tabs(assume it contains more than 100 tabs) and upload the file back to SAS server for various purposes(it circles for many edits).
This is great. Thanks Chris and Tom. I combine local and third party data for reporting. After getting the third party data via API, I upload the api df to SAS Server to finish the saspy script. It would be nice to seamlessly add the api df within a sas.submit
statement.
@chrishales709 I have an implementation for getting file information coded up. Here's an example showing this for my saspy directory (current dir - '.'). I get the list of files from the dirlist() method then iterate over them getting the file info for each file (excluding any directories). The file information is returned in a dataframe that you can interrogate at will. I returned it like that cuz I just did the implementation for the member list of tables for a libref for issue 182 and this was very similar. Let me know if a dataframe isn't what you want, and I'll see if I can convert it to something else. I'm not much of a dataframe programmer :)
>>> d1 = sas.dirlist('.')
>>>
>>> d1
['__init__.py', 'sasbase.py.bak', 'sasproccommons.py', 'version.py.bak', 'sasdecorator.py', 'sasbase.py', 'sasstat.py', 'sascfg_personal.py', 'sasdata.py', 'sasiohttp.py', 'titanic.csv', 'SASLogLexer.py', 'sasml.py.bak', 'sasqc.py', 'sasViyaML.py', 'sasutil.py', 'sasml.py', 'sasets.py', 'libname_gen.sas', 'sasresults.py', 'sasiostdio.py', '__pycache__/', 'sasdata.py.bak', 'sastabulate.py', 'doc/', 'autocfg.py', 'sasioiom.py', 'version.py',
>>>
>>> for file in d1:
... if file[len(file)-1] != sas.hostsep:
... sas.file_info('./'+file)
...
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/__init__.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 30Nov2018:10:53:38
5 File Size (bytes) 1469
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/sasbase.py.bak
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 05Dec2018:08:22:37
5 File Size (bytes) 55341
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/sasproccommons.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 04Dec2018:16:52:25
5 File Size (bytes) 32090
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/version.py.bak
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 07Nov2018:09:32:50
5 File Size (bytes) 22
[...] removing a bunch to shorten this
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/version.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 04Dec2018:16:52:25
5 File Size (bytes) 22
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/sas_magic.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 30Nov2018:10:53:38
5 File Size (bytes) 6713
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/sascfg.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 30Nov2018:10:53:38
5 File Size (bytes) 10630
>>>
And, here's just grabbing one:
>>> dinfo = sas.file_info('./sasbase.py')
>>> dinfo
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/sasbase.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 05Dec2018:08:31:08
5 File Size (bytes) 55330
>>>
Thoughts? Tom
I think I'll change this to return a dictionary like I was thinking in the first place. Trying to navigate the df to get values isn't very clean. I can have it return either if you want', add a resutls=['df' | 'dict'] parameter. I think a dict just make more sense for this one. Let me know what you think, Tom
Ok, got a dict being returned. Here's what it's like:
>>> finfo = sas.file_info_dict('./sasbase.py')
>>> finfo
{'Owner Name': 'sastpw', 'File Size (bytes)': '57448', 'Last Modified': '05Dec2018:11:47:25', 'Group Name': 'r&d', 'Filename': '/opt/tom/github/saspy/saspy/sasbase.py', 'Access Permission': '-rw-r--r--'}
>>>
>>> finfo['Last Modified']
'05Dec2018:11:47:25'
>>>
>>> for key in finfo.keys():
... print(key+' = '+finfo[key])
...
Owner Name = sastpw
File Size (bytes) = 57448
Last Modified = 05Dec2018:11:47:25
Group Name = r&d
Filename = /opt/tom/github/saspy/saspy/sasbase.py
Access Permission = -rw-r--r--
>>>
>>> finfo.keys()
dict_keys(['Owner Name', 'File Size (bytes)', 'Last Modified', 'Group Name', 'Filename', 'Access Permission'])
>>> finfo.values()
dict_values(['sastpw', '57448', '05Dec2018:11:47:25', 'r&d', '/opt/tom/github/saspy/saspy/sasbase.py', '-rw-r--r--'])
>>>
@chrishales709 I pushed this code to master so you can try it out. I ended up implementing it to return the dictionary. If you want a dataframe, just specify results='pandas' like this:
>>> filepath = '.'
>>> d1 = sas.dirlist(filepath)
>>>
>>> for file in d1:
... if file[len(file)-1] != sas.hostsep:
... sas.file_info(filepath+sas.hostsep+file)
... sas.file_info(filepath+sas.hostsep+file, results='pandas')
...
{'Access Permission': '-rw-r--r--', 'Filename': '/opt/tom/github/saspy/saspy/__init__.py', 'Group Name': 'r&d', 'File Size (bytes)': '1469', 'Last Modified': '30Nov2018:10:53:38', 'Owner Name': 'sastpw'}
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/__init__.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 30Nov2018:10:53:38
5 File Size (bytes) 1469
{'Access Permission': '-rw-r--r--', 'Filename': '/opt/tom/github/saspy/saspy/sasbase.py.bak', 'Group Name': 'r&d', 'File Size (bytes)': '59850', 'Last Modified': '05Dec2018:14:01:26', 'Owner Name': 'sastpw'}
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/sasbase.py.bak
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 05Dec2018:14:01:26
5 File Size (bytes) 59850
{'Access Permission': '-rw-r--r--', 'Filename': '/opt/tom/github/saspy/saspy/sasproccommons.py', 'Group Name': 'r&d', 'File Size (bytes)': '32655', 'Last Modified': '05Dec2018:12:13:21', 'Owner Name': 'sastpw'}
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/sasproccommons.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 05Dec2018:12:13:21
5 File Size (bytes) 32655
{'Access Permission': '-rw-r--r--', 'Filename': '/opt/tom/github/saspy/saspy/sascfg.py.bak', 'Group Name': 'r&d', 'File Size (bytes)': '10886', 'Last Modified': '05Dec2018:10:25:36', 'Owner Name': 'sastpw'}
I did the same with list_tables() method from #182 where I return a list of tuples (memname, memtype) by default now, but you can get it as a dataframe w/ results='pandas' on the list_tables() invocation. I like saspy to work even if you don't have Pandas, so this way it does, and you can get the dataframe if you want.
Let me know how it works for you. Next thing on the list will be up/download of files. That may take a bit longer :)
Thanks! Tom
@jpf5046 Can you explain the comment
seamlessly add the api df within a sas.submit statement.
a little more? Maybe an example of what you're looking to do? I'm not sure I understand that comment in the context of the methods I've put together for these couple issues. These methods are at master now, so you can play with them and see what you think. Thanks! Tom
@tomweber-sas I tested the file_info method, and it looks great. I did notice, however, that the 'Filename' value in both the dictionary version and dataframe version did not include the full path. For example,
f = sas.file_info('/path/to/my/file/file.sas')
got me this:
{'Filename': '/path/to/', 'Owner Name'...}
It actually didn't give me back the full path. Is there a max length for this value?
Hey Chris, that's curious. I don't see that for either case. Can you send the saslog from after running that?
print(sas.saslog())
For the default case (dict) you should see the info in the log like:
3451 data _null_;
3452 length infoname infoval $60;
3453 drop rc fid infonum i close;
3454 put 'INFOSTART';
3455 fid=fopen('_spfinfo');
3456 if fid then
3457 do;
3458 infonum=foptnum(fid);
3459 do i=1 to infonum;
3460 infoname=foptname(fid, i);
3461 infoval=finfo(fid, infoname);
3462 put 'INFONAME=' infoname;
3463 put 'INFOVAL=' infoval;
3464 end;
3465 end;
3466 put 'INFOEND';
3467 close=fclose(fid);
3468 rc = filename('_spfinfo');
3469 run;
INFOSTART
INFONAME=Filename
INFOVAL=/opt/tom/github/saspy/saspy/sascfg.py
INFONAME=Owner Name
INFOVAL=sastpw
INFONAME=Group Name
INFOVAL=r&d
INFONAME=Access Permission
INFOVAL=-rw-r--r--
INFONAME=Last Modified
INFOVAL=06Dec2018:13:20:15
INFONAME=File Size (bytes)
INFOVAL=10885
INFOEND
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Here's one file info I get. I don't see any kind of truncation. Maybe we'll see something in your log.
{'Filename': '/opt/tom/github/saspy/saspy/sascfg.py', 'File Size (bytes)': '10885', 'Group Name': 'r&d', 'Owner Name': 'sastpw', 'Last Modified': '06Dec2018:13:20:15', 'Access Permission': '-rw-r--r--'}
infoname infoval
0 Filename /opt/tom/github/saspy/saspy/sascfg.py
1 Owner Name sastpw
2 Group Name r&d
3 Access Permission -rw-r--r--
4 Last Modified 06Dec2018:13:20:15
5 File Size (bytes) 10885
BTW, I see your path was linux, but I also tried this on windows and I'm not seeing truncation either. I do see that the default for displaying the dataframe truncates the column, but that's only a display thing, the whole value is actually there. Could it be something like that, where it's just not displaying it?
@jpf5046 Can you explain the comment
seamlessly add the api df within a sas.submit statement.
a little more? Maybe an example of what you're looking to do? I'm not sure I understand that comment in the context of the methods I've put together for these couple issues. These methods are at master now, so you can play with them and see what you think. Thanks! Tom
Here's a better example, I have a file on my desktop that python reads, df = read.csv('desktop/file.txt')
, I then want to have that df
to be run with a SAS dataset on the SAS server. Right now, python can have the file loaded locally, but I cannot merge the dataframe in a sas.submit
statement. I must upload 'desktop/file.txt' with SAS EG-on to the server- to run with SAS code.
Is there a way to take my local df
and upload it to the SAS server, so I can run code that might look like this, and do this all within Jupyter:
c = sas.submit("""
proc sql;
create table new_table as
select
*
from work.df;
quit;
""")
...where work.df is the file from my desktop?
Oh, yes, that's been in saspy since day 1. It's the dataframe2sasdata() method; df2sd() for short. You would simply do the following
#assume your SASsession object is named 'sas'
sas = saspy.SASsession()
# read in your dataframe
df = read.csv('desktop/file.txt')
# upload data frame to work.new_table on SAS server
new_table = sas.df2sd(df, 'new_table')
# and now have a SASdata object in python that refers to it
new_table.head()
Here's a run doing this:
tom64-3> python3.5
Python 3.5.5 (default, Feb 6 2018, 10:56:47)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import saspy
>>> sas = saspy.SASsession()
SAS Connection established. Subprocess id is 9991
>>> import pandas as pd
>>> df = pd.read_csv('./titanic.csv')
>>> df.head()
Unnamed: 0 PassengerId Survived Pclass \
0 1 1 0 3
1 2 2 1 1
2 3 3 1 3
3 4 4 1 1
4 5 5 0 3
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
>>> new_table = sas.df2sd(df, 'new_table')
>>> new_table.head()
Unnamed: 0 PassengerId Survived Pclass \
0 1 1 0 3
1 2 2 1 1
2 3 3 1 3
3 4 4 1 1
4 5 5 0 3
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1
2 Heikkinen, Miss. Laina female 26 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1
4 Allen, Mr. William Henry male 35 0
Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
>>> new_table
Libref = WORK
Table = new_table
Dsopts = {}
Results = Pandas
>>>
@tomweber-sas I was using the syntax incorrectly. Thank you for providing the example! I'm all set.
@jpf5046 Great. Just open another issue of you have any other questions! Tom
@tomweber-sas
I think I found what may be causing the error. My SAS setup has really long path names (ex: 150+ characters). First, it looks like the SAS code is limiting the length of infoval on lines 1286 and 1313. The length of infoname and infoval are set to 60, so it would cut off any long path names. I tried setting the length to 500 on these lines, and that appears to have fixed the issue on the SAS side. The SAS log now shows the full value of infoval. However, infoval covered multiple lines in the log, so it causes an issue for line 1341 where the value is parsed out of the log. My infoval looked like this in the log:
INFOVAL=
/imagine/this/is/a/very/long/sas/path/that/co
vers/multiple/lines/filename.sas
As a result, I'm actually getting ''
when I run f['Filename']
Oh, of course, that's cut-n-passted right out of the SAS example doc for this. I didn't even see it looking at it :( I've addressed this at master. Can you pull master and try it again and see if both cases work now? I bumped it up to 256, which is also the max linesize.
Sorry for the late response. I've been out of town. I tested the fix for both dictionaries and data frames, and everything looks good. Thanks!
Are you still working on the upload/download piece?
No problem. Thanks for verifying! Well, I haven't actually started on those yet, unfortunately. Been pulled in other directions around here. If I'm lucky, next week will be a slow week, w/ the holiday and all, and I'll be able to spend some time on those. Always happy to have external contributors too though, :) :) Thanks, Tom
Hey, I've got an upload implementation working, both STDIO and IOM. Just did it, so it certainly needs more testing and such. But, it works for the cases I've tried. It's a binary transfer, or an image copy, if you will.
I've run it with a simple text file, and html document file, and an executable program (truly binary file).
Everything diffs equal and obviously are the same length. More details to work through before it's production, but I've pushed it to a branch called upload-download.
File permissions is something that still needs to be addressed. Right now, it's all defaults.
If you have a chance, try it out from there. I'll continue on it and see about the equivalent download next. It's not the fastest thing in the world, but for having to do it all w/ python and SAS code, it isn't too bad.
import saspy
sasstd = saspy.SASsession(cfgname='sdssas')
sasiom = saspy.SASsession(cfgname='iomj')
sasstd.upload('/u/sastpw/tomin', '/u/sastpw/tomout_std')
sasstd.upload('/u/sastpw/sashtml.htm', '/u/sastpw/tomhtml_std')
sasstd.upload('/u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy', '/u/sastpw/tommaspy_std')
sasiom.upload('/u/sastpw/tomin', '/u/sastpw/tomout_iom')
sasiom.upload('/u/sastpw/sashtml.htm', '/u/sastpw/tomhtml_iom')
sasiom.upload('/u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy', '/u/sastpw/tommaspy_iom')
Here's listing the files after:
38 Dec 19 12:15 /u/sastpw/tomin
38 Dec 20 11:45 /u/sastpw/tomout_std
34866 Dec 18 16:13 /u/sastpw/sashtml.htm
34866 Dec 20 11:45 /u/sastpw/tomhtml_std
647196 Dec 12 13:00 /u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy
647196 Dec 20 11:45 /u/sastpw/tommaspy_std
38 Dec 19 12:15 /u/sastpw/tomin
38 Dec 20 11:45 /u/sastpw/tomout_iom
34866 Dec 18 16:13 /u/sastpw/sashtml.htm
34866 Dec 20 11:45 /u/sastpw/tomhtml_iom
647196 Dec 12 13:00 /u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy
647196 Dec 20 11:45 /u/sastpw/tommaspy_iom
Tom
Ok, I added in the permission= option. I'm afraid it's just the exact string the Filename statement wants. But, that's the same on Unix and Windows; portable syntax document in the Filename statement section of each host guide: https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=hostwin&docsetTarget=chfnoptfmain.htm&locale=en#p1m24anc2sxjp1n1futk0ekxn3to https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=hostunx&docsetTarget=n1cwdt7h01vaken0zl8veh8x3ybc.htm&locale=en
Here's the one for the executable file, showing the resulting permissions:
print(sasstd.upload('/u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy', '/u/sastpw/tommaspy_std',
permission='A::u::rwx,A::g::rwx,A::o::r-x'))
tom64-3> ll /u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy /u/sastpw/tommaspy_std
-rwxrwxr-x 1 userid groupid 647196 Dec 12 13:00 /u/sastpw/tkmaspyvb025/tkmas/com/laxnd/maspy
-rwxrwxr-x 1 userid groupid 647196 Dec 20 12:19 /u/sastpw/tommaspy_std
tom64-3>
Let me know what you find! Thanks, Tom
Hey , I tried uploading a file but it is running forever. Any thoughts?
how big is the file? I haven't tried anything significant;y large. I just pushed a fix for 0 length files, which would hang (run indefinitely). Try something small first to see if it works?
I tried with 2Mb file. Let me try with 1kb file.
Is this due to permission?
Well, I'm not sure exactly. I also tried to write to something that wasn't valid after I saw your first problem. That was maybe a different error:
>>> print(sasstd.upload('/u/sastpw/tomin', '/fff'))
75
76 filename saspydir '/fff' recfm=F encoding=binary lrecl=1 permission='';
77 data _null_;
78 file saspydir;
79 infile datalines;
80 input;
81 lin = length(_infile_);
82 outdata = inputc(_infile_, '$hex.', lin);
83 lout = lin/2;
84 put outdata $varying80. lout;
85 datalines4;
ERROR: Insufficient authorization to access /fff.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.01 seconds
87 ;;;;
88
89 run;
90
That's a different error. But, obviously, you have to have permission to create the file you're trying to create. There's no magic about doing this. If you can't submit the equivalent code to create a file from the SAS server, you won't be able to do it via saspy, as I'm just submitting SAS code.
What's that path? is it a valid file, or is it an existing directory, such that it's not a valid file to create? I can't say off the top of my head why you would get that specific error. I haven't seen that error in anything I've tried so far.
Tom
Aha. Yes, I get that error when I specify a directory. I guess I could add support for accessing the target and seeing if it's a directory, then get the file name from the source and use that. But, for now, just specify the file name and see if it's working like you think.
>>> print(sasiom.upload('/u/sastpw/tomin', '/u/sastpw/tomdir'))
4 The SAS System 09:36 Friday, December 21, 2018
29
30 filename saspydir '/u/sastpw/tomdir' recfm=F encoding=binary lrecl=1 permission='';
31 data _null_;
32 file saspydir;
33 infile datalines;
34 input;
35 if _infile_ = '' then delete;
36 lin = length(_infile_);
37 outdata = inputc(_infile_, '$hex.', lin);
38 lout = lin/2;
39 put outdata $varying80. lout;
40 datalines4;
ERROR: Invalid file, /u/sastpw/tomdir.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
5 The SAS System 09:36 Friday, December 21, 2018
I was able to upload a small file from Windows to a Linux server without any issue. I was also able to upload a slightly larger csv file (23 KB). This works for my use case. The only feedback I would have would be to replace the log print out with a message (ex: 'Finished uploading example.sas (xx sec)' or 'Unable to upload example.sas').
Great, thanks. I'll see about changing up what's returned. This needs to be able to be interrogated programmatically after to see if it succeeded or not. I'm thinking of a dict w/ a status and the log segment, not unlike what's returned in batch mode, or from submit(). That way you can test it and you have the log to see what happened if it failed.
Also, I have an initial implementation of download pushed to the upload-download branch now too. Same deal w/ return there for now. And, I have some optimizations to do and error handling, like in upload. But, it's working and if you want to try it, that's be great.
Thank, Tom
Out of curiosity, why couldn't the STDIO implementation use a socket connection on a local port to stream the file to SAS, the way it currently handles downloads but in reverse?
Hey @jasonphillips , So, to get better performance by not having to convert to hex string and informat that back to real binasy in SAS? Well, that's a good idea. I don't expect there's a reason that can't work. I guess I was just following the pattern of df2sd() and sd2df(), where df2sd could work w/ the STDIO channels and didn't require special support that might not be available (ability to open sockets between the two machines). I'm finishing up (hopefully), some more functionality on these; return Dict with Success and LOG, error handling for files that don't exist, and supporting a directory instead of filename for the destination - using filename from source. After that, I can try our this idea and see if there's any issue, and see how much better it performs. Probably similar to sd2df() vs. sd2df_CVS(). The bigger the file the more it matters, I would suspect.
Hey, while I have you, did you see the new saspy_examples repo? I copied your tabluate notebook there, but wanted to see if you wanted to push it there yourself (PR) so it had your id as the contributor there. I was going to delete it from the saspy repo then.
Thanks for this idea, it should help out performance! Tom
Here's what I'm thinking for what's returned from upload and download. Oh, and you can see the dest is a directory, so the souce file name is used for the dest file name (in the log):
>>> res = sas.upload( '/u/sastpw/compare/maspy35_up', r'C:\Users\sastpw\Documents\updown')
>>> res.keys()
dict_keys(['Success', 'LOG'])
>>>
# so you can do the following
>>>
>>> res = sas.upload( '/u/sastpw/compare/maspy35_up', r'C:\Users\sastpw\Documents\updown')
>>> print(res['Success'])
True
>>> if not res['Success']:
... print(res['LOG'])
...
>>>
# and print the log at will, of course
>>> print(res['LOG'])
12 The SAS System 12:06 Thursday, January 10, 2019
11874
11875 filename saspydir 'C:\Users\sastpw\Documents\updown\maspy35_up' recfm=F encoding=binary lrecl=1 permission='';
11876 data _null_;
11877 file saspydir;
11878 infile datalines;
11879 input;
11880 if _infile_ = '' then delete;
11881 lin = length(_infile_);
11882 outdata = inputc(_infile_, '$hex.', lin);
11883 lout = lin/2;
11884 put outdata $varying80. lout;
11885 datalines4;
NOTE: The file SASPYDIR is:
Filename=C:\Users\sastpw\Documents\updown\maspy35_up,
RECFM=F,LRECL=1,File Size (bytes)=0,
Last Modified=10Jan2019:12:30:05,
Create Time=10Jan2019:12:06:14
13 The SAS System 12:06 Thursday, January 10, 2019
NOTE: DATA statement used (Total process time):
real time 3.33 seconds
cpu time 2.60 seconds
NOTE: 234496 records were written to the file SASPYDIR.
23613 ;;;;
23614
14 The SAS System 12:06 Thursday, January 10, 2019
23615
23616 run;
23617 filename saspydir;
NOTE: Fileref SASPYDIR has been deassigned.
23618
23619
>>>
Thoughts? Tom
I used upload-download branch and ran .upload() function but it is not successful.
I haven't pushed those last things yet. I'm still working on them. The output I showed was still just from my development repo. Once I finish it up I'll push it out for you to try. Note that's just the log that was returned, not the Dictionary I'll be returning. Was just looking for feedback on if that look good to you or do you think you would need something different? I'll let you know when these latest features are at master. For now, specify valid files and upload and download should work.
Thanks, Tom
Ah. Your example looks good and the format is also good.
I was about to ask is it possible to have similar success key for every functions in saspy. In my use case, when I execute any saspy function from GUI, I would like to throw some message to user(specifically when it fails). Any thoughts?
ok, I just pushed these features. Go ahead and try it out and let me know how it works.
As for changing the API to all methods in saspy, I can't do that. But, there are many methods you can tell if they worked or failed. Some I couldn't tell either way anyway, so I couldn't say. There are methods for getting SAS automarco variables which are basically return codes and statuses for SAS code that was submitted, so those would be useful for a number of situations.
If you have any specific cases, I can look at them to see what can be done. Happy to do that. But, really, I can't change something that pervasive and break peoples existing code.
Also, the Batch mode might help out in this case. It returns a dict of LOG LST, like submit(), so you may be able to use that to accomplish what you need. For instance, for a given method, if the LST is empty, that may mean it failed, or you can check the log for a known error that proves it worked or didn't.
Tom
I tested with different file sizes. Here are my findings. 1Kb - 5 seconds 20kb - 15 seconds 32kb - 40 seconds 1mb - roughtly 5 minutes
upload or download or both the same?
upload. I have to test for download.
Download seems to be lot better. 300kb took just 3 seconds.
Then I'll @jasonphillips great suggestion and re-implement upload using sockets which should make it run about the same as the download. For now, if you use small files and see if there are any holes in the implementation, that's be great. Handling invalid files, permissions, ... all the edge cases. I'll work on the other implementation next.
Oh, wait, I bet you're using IOM, not STDIO over SSH. I'll have to see about that, it's not like STDIO. But, I may be able to get the 'reverse' to work, so I'll look into both of those cases. Having to encode the binary into hex chars and reconvert to binary is a horrible way to have to do it, but that was a first pass that got us this far.
Thanks, Tom
Ok, I've re-implemented upload in STDIO via sockets. @jasonphillips , are you able to try this out? You have linux? I think everyone else is on Windows and can't try it. BTW, the original implementation is still in there to compare against. You have to go to the access method to call it though, so:
res_sock = sas.upload ('local', 'remote')
res_slow = sas._io.upload_slow('local', 'remote')
I'm looking at the IOM access method now, and it will require more changes than what STDIO took. I'll have to chance the java code as well as python. It'll take some time to work through. But, hopefully I can get it working similarly.
Tom
Great, I gave it a try, generating some dummy files of exact sizes, and saw the following speeds (reporting "real time" from log):
sas.upload()
# 50k - .02 seconds
# 500k - .10 seconds
# 1MB - .25 seconds
# 5MB - 1.06 seconds
# 25MB - 5.13 seconds
sas._io.upload_slow()
# 50k - .18 seconds
# 500k - 1.68 seconds
# 1MB - 3.58 seconds
# 5MB - 15.53 seconds
# 25MB - 1.26 minutes
Both look like linear scales, but indeed the socket method is about 10-15x faster.
I did seem to be having some issues with the socket not being freed up immediately afterward, although haven't investigated thoroughly yet. Just after an upload, any calls that use a socket (uploading another file, or using sd2df()
methods) returned the generic socket error; it cleared after waiting for another 30 seconds or so. That doesn't happen for me with many repeated uses of the other socket methods, so perhaps something isn't being freed up in this case.
Hey Jason, thanks for verifying that. I am using ephemeral ports for these, so it should use a different port each time and not need to wait for a timeout. Unless, if you are using a tunneling port over SSH, then I have to use that port instead of an ephemeral, and that could be the cause of the delay. I'll dig into this further too to see if I see anything suspicious. I'm going to try to get the IOM access method working first though. Thanks gain, Tom
I am using in tunneling port in my case, so that might explain it; odd that the other methods using the port don't lock it up even with many quick calls in a row, but the file transfer holds it for a bit until another request using sockets can complete.
Thanks Jason. I just pushed an implementation of binary stream transfer on upload for IOM. It should behave comparably to the download for IOM now, like the up/down for STDIO. I'll look into this STDIO issue next, now that I have the IOM case working.
@chrishales709 @mailbagrahul can you guys try out the new upload for you IOM cases and see if it's working and faster for you? Just like the STDIO, I left the otiginal implementation in there so you can compare. See above comments for running (sas._io.upload_slow())
@jasonphillips I have one idea about this delay, given it doesn't happen for the other cases. In all cases except this upload, I'm transferring data from SAS to saspy, and saspy is the socket 'server' (creates and accepts the connection). In this upload case, I'm transferring data the other way, and the socket connection is still the same direction. So, I will try reversing the socket connection to see if that might fix this. It could be that the linger is set when SAS is receiving, not transmitting, since at close, it isn't the one that shut down the socket. I may be able to try this out today and see.
Tom
Ok. I tried with both cases and I see sas.upload() is pretty faster(2Mb - .25seconds) than sas._io.upload_slow() (2Mb - 4minutes)
And download() seems to be taking long time(more than 3+ minutest) to download 250kb file.
@mailbagrahul thanks for trying it out. Something must be wrong w/ your download. It should be very similar to upload. I can download/upload 2M in 1-2 seconds; granted I don't have a significant network delay in these cases. Can you provide any more details on what you're seeing? You are using IOM, right?
Thanks! Tom
Here's a run with a 2M executable from jupyter:
import time
start = time.localtime()
res = sas.download(r'C:\Users\sastpw\Documents\updown\cprxp_dn', '/u/sastpw/compare/cprxp')
finish = time.localtime()
print(res['Success'])
print(res['LOG'])
True
51 The SAS System 07:55 Thursday, January 17, 201
9
646
647 data _null_;
648 infile '/u/sastpw/compare/cprxp' recfm=F encoding=binary lrecl=4096;
649 file _tomods1 recfm=N;
650 input;
651 put _infile_;
652 run;
NOTE: The infile '/u/sastpw/compare/cprxp' is:
Filename=/u/sastpw/compare/cprxp,
Owner Name=sastpw,Group Name=r&d,
Access Permission=-r-xr-xr-x,
Last Modified=10Jan2019:16:43:19,
File Size (bytes)=2075894
NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file _TOMODS1 is:
Filename=/sastmp/SAS_workEB8500000A40_tom64-3/SAS_work388A00000A40_tom64-3/_tomods1,
Owner Name=sastpw,Group Name=r&d,
Access Permission=-rw-r--r--,
Last Modified=17Jan2019:08:00:38
NOTE: 507 records were read from the infile '/u/sastpw/compare/cprxp'.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 364.37k
OS Memory 21664.00k
Timestamp 01/17/2019 08:00:38 AM
Step Count 275 Switch Count 0
Page Faults 1
Page Reclaims 16
Page Swaps 0
Voluntary Context Switches 49
Involuntary Context Switches 31
Block Input Operations 440
Block Output Operations 4056
653
654
52 The SAS System 07:55 Thursday, January 17, 201
9
655
print(start)
print(finish)
print(start)
print(finish)
time.struct_time(tm_year=2019, tm_mon=1, tm_mday=17, tm_hour=8, tm_min=0, tm_sec=37, tm_wday=3, tm_yday=17, tm_isdst=0)
time.struct_time(tm_year=2019, tm_mon=1, tm_mday=17, tm_hour=8, tm_min=0, tm_sec=38, tm_wday=3, tm_yday=17, tm_isdst=0)
_up
start = time.localtime()
res = sas.upload(r'C:\Users\sastpw\Documents\updown\cprxp_dn', '/u/sastpw/compare/cprxp_up')
finish = time.localtime()
print(res['Success'])
print(res['LOG'])
True
55 The SAS System 07:55 Thursday, January 17, 201
9
693
694
695 data _null_;
696 infile _tomods1 recfm=F encoding=binary lrecl=4096;
697 file '/u/sastpw/compare/cprxp_up' recfm=N permission='';
698 input;
699 put _infile_;
700 run;
NOTE: The infile _TOMODS1 is:
Filename=/sastmp/SAS_workEB8500000A40_tom64-3/SAS_work388A00000A40_tom64-3/_tomods1,
Owner Name=sastpw,Group Name=r&d,
Access Permission=-rw-r--r--,
Last Modified=17Jan2019:08:01:20,
File Size (bytes)=2075894
NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file '/u/sastpw/compare/cprxp_up' is:
Filename=/u/sastpw/compare/cprxp_up,
Owner Name=sastpw,Group Name=r&d,
Access Permission=-rw-r--r--,
Last Modified=17Jan2019:08:01:20
NOTE: 507 records were read from the infile _TOMODS1.
NOTE: DATA statement used (Total process time):
real time 0.24 seconds
user cpu time 0.00 seconds
system cpu time 0.02 seconds
memory 360.68k
OS Memory 21408.00k
Timestamp 01/17/2019 08:01:20 AM
Step Count 277 Switch Count 0
Page Faults 1
Page Reclaims 18
Page Swaps 0
Voluntary Context Switches 43
Involuntary Context Switches 31
Block Input Operations 440
Block Output Operations 4056
701
702
print(start)
print(finish)
time.struct_time(tm_year=2019, tm_mon=1, tm_mday=17, tm_hour=8, tm_min=1, tm_sec=19, tm_wday=3, tm_yday=17, tm_isdst=0)
time.struct_time(tm_year=2019, tm_mon=1, tm_mday=17, tm_hour=8, tm_min=1, tm_sec=21, tm_wday=3, tm_yday=17, tm_isdst=0)
tom64-2> ll cprxp cprxp_up
-r-xr-xr-x 1 sastpw r&d 2075894 Jan 10 16:43 cprxp
-rw-r--r-- 1 sastpw r&d 2075894 Jan 17 08:01 cprxp_up
tom64-2> diff cprxp cprxp_up
tom64-2>
I'm looking for a way to upload files to the SAS server. I'm also looking for a way to get information on SAS server files (ex: create date, modified date).
The use case I have for this is code deployment. I develop SAS programs locally using the Atom editor. Once I'm done developing and testing, I merge my code into the production branch of the project's git repository. Right now I have to manually copy the production branch files to the SAS server. I would like to develop a python program using saspy to compare production branch files to the files on the SAS server, and then replace outdated files with the newer versions of the file.
First, I was wondering if you could add a method for copying a file to the SAS server. Second, I saw the work done on the dirlist method. I was wondering if you could also return the create date and modified date along with the file name.