Closed aaronsmith1234 closed 3 years ago
Hey @aaronsmith1234 I've tried what I believe you're describing on my side, but I see it doing what you are describing as expected. Maybe I'm missing something. Do you have an example of what you are seeing? Here's what I tried: 1) create a df 2) create another from it w/ one of the columns set as the index 3) write both to SAS (whole table from non-indexed, all but index from the indexed one) 4) reset the df w/ the index, so now it's back to all columns 5) create new data set, and overwrite the one created from the indexed df 6) both new and overwritten data sets have all columns
tom64-5> python3.5
Python 3.5.6 (default, Nov 16 2018, 15:50:39)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-23)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import saspy
>>> sas = saspy.SASsession(cfgname='iomj')
SAS Connection established. Subprocess id is 19643
No encoding value provided. Will try to determine the correct encoding.
Setting encoding to utf_8 based upon the SAS session encoding value of utf-8.
>>> sas
Access Method = IOM
SAS Config name = iomj
SAS Config file = /opt/tom/github/saspy/saspy/sascfg_personal.py
WORK Path = /sastmp/SAS_work792300001D19_tom64-5/SAS_work162900001D19_tom64-5/
SAS Version = 9.04.01M4D11092016
SASPy Version = 3.6.6
Teach me SAS = False
Batch = False
Results = Pandas
SAS Session Encoding = utf-8
Python Encoding value = utf_8
SAS process Pid value = 7449
>>> # use class dataset for examples
>>> sd = sas.sasdata('class','sashelp')
>>> #create df without an index, whole table
>>> df = sd.to_df()
>>> df
Name Sex Age Height Weight
0 Alfred M 14 69.0 112.5
1 Alice F 13 56.5 84.0
2 Barbara F 13 65.3 98.0
3 Carol F 14 62.8 102.5
4 Henry M 14 63.5 102.5
5 James M 12 57.3 83.0
6 Jane F 12 59.8 84.5
7 Janet F 15 62.5 112.5
8 Jeffrey M 13 62.5 84.0
9 John M 12 59.0 99.5
10 Joyce F 11 51.3 50.5
11 Judy F 14 64.3 90.0
12 Louise F 12 56.3 77.0
13 Mary F 15 66.5 112.0
14 Philip M 16 72.0 150.0
15 Robert M 12 64.8 128.0
16 Ronald M 15 67.0 133.0
17 Thomas M 11 57.5 85.0
18 William M 15 66.5 112.0
>>> df.dtypes
Name object
Sex object
Age int64
Height float64
Weight float64
dtype: object
>>> df.shape
(19, 5)
>>> #create df with an index, use Name columnm for the index
>>> dfi = df.set_index('Name')
>>> dfi
Sex Age Height Weight
Name
Alfred M 14 69.0 112.5
Alice F 13 56.5 84.0
Barbara F 13 65.3 98.0
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83.0
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
Jeffrey M 13 62.5 84.0
John M 12 59.0 99.5
Joyce F 11 51.3 50.5
Judy F 14 64.3 90.0
Louise F 12 56.3 77.0
Mary F 15 66.5 112.0
Philip M 16 72.0 150.0
Robert M 12 64.8 128.0
Ronald M 15 67.0 133.0
Thomas M 11 57.5 85.0
William M 15 66.5 112.0
>>> # columns for df and dfi
>>> df.shape
(19, 5)
>>> dfi.shape
(19, 4)
>>> # create SAS data from df without an index, whole table
>>> sddf = sas.df2sd(df, 'df')
>>> # create SAS data from dfi with the indec set (that column won't be in the SAS data set)
>>> sddfi = sas.df2sd(dfi, 'dfi')
>>> sddf.columnInfo()
Member Num Variable Type Len Pos
0 WORK.DF 3 Age Num 8 0
1 WORK.DF 4 Height Num 8 8
2 WORK.DF 1 Name Char 7 24
3 WORK.DF 2 Sex Char 1 31
4 WORK.DF 5 Weight Num 8 16
>>> sddfi.columnInfo()
Member Num Variable Type Len Pos
0 WORK.DFI 2 Age Num 8 0
1 WORK.DFI 3 Height Num 8 8
2 WORK.DFI 1 Sex Char 1 24
3 WORK.DFI 4 Weight Num 8 16
>>> # reset the index and get back to the whole table again
>>> dfi2 = dfi.reset_index()
>>> dfi.shape
(19, 4)
>>> dfi2.shape
(19, 5)
>>> # create NEW SAS data from df with the reset index, whole table
>>> sddfi2 = sas.df2sd(dfi2, 'dfi2')
>>> sddfi2.columnInfo()
Member Num Variable Type Len Pos
0 WORK.DFI2 3 Age Num 8 0
1 WORK.DFI2 4 Height Num 8 8
2 WORK.DFI2 1 Name Char 7 24
3 WORK.DFI2 2 Sex Char 1 31
4 WORK.DFI2 5 Weight Num 8 16
>>>
>>> # remember the original SAS Data Set created from the indexed df missing Name column
>>> sddfi.columnInfo()
Member Num Variable Type Len Pos
0 WORK.DFI 2 Age Num 8 0
1 WORK.DFI 3 Height Num 8 8
2 WORK.DFI 1 Sex Char 1 24
3 WORK.DFI 4 Weight Num 8 16
>>> sddfi
Libref = WORK
Table = dfi
Dsopts = {}
Results = Pandas
>>> # Overwrite original table create from the indexed df, missing Name column with reset df and get whole table
>>> sddfir = sas.df2sd(dfi2, 'dfi')
>>> sddfir
Libref = WORK
Table = dfi
Dsopts = {}
Results = Pandas
>>> sddfir.columnInfo()
Member Num Variable Type Len Pos
0 WORK.DFI 3 Age Num 8 0
1 WORK.DFI 4 Height Num 8 8
2 WORK.DFI 1 Name Char 7 24
3 WORK.DFI 2 Sex Char 1 31
4 WORK.DFI 5 Weight Num 8 16
>>> # Since this SASdata object references the same table, just replaced, it shows the new whole table now same as sddfir
>>> sddfi.columnInfo()
Member Num Variable Type Len Pos
0 WORK.DFI 3 Age Num 8 0
1 WORK.DFI 4 Height Num 8 8
2 WORK.DFI 1 Name Char 7 24
3 WORK.DFI 2 Sex Char 1 31
4 WORK.DFI 5 Weight Num 8 16
>>>
Sure @tomweber-sas , here you go!
Source data (SAS dataset on Remote SAS Server on Solaris). Note that this is dummy data I got from a SAS example somewhere, not real data that means anything.
SAS connection info. Note that I've blocked out a few things that might be considered "secrets"
#not sure if better way to set libraries, but this works
results=conn.submit("libname persfold '/parentfolder/childfolder/';")
data=conn.sasdata2dataframe("testdata","persfold")
type(data)
#this reads in the data as expected
agg_data=data.groupby("gender").agg("count")
conn.dataframe2sasdata(agg_data,"datafromPython","persfold",)
You can see here that the resultant dataset is missing the gender column that I indexed on.
but that in Python, the index column is there
Now, if I reset the index, the data in Python looks like this
agg_data=data.groupby("gender").agg("count").reset_index()
agg_data.head()
but, if I try and overwrite the original dataset using this
conn.dataframe2sasdata(agg_data,"datafromPython","persfold",)
I don't get an error, but it also doesn't overwrite the data as expected.
However, if I write it to a NEW dataset, then it does output as expected
conn.dataframe2sasdata(agg_data,"datafromPython2","persfold",)
Just in case its helpful, my pandas and ipython versions
pandas==1.2.3
ipython==7.21.0
I've tried this use case and it works too. What is this 'window' or whatever you're viewing the sasdata in. I think after you replace the table w/out the index, you need to refresh that window or make it reread the new table that replaced the table, because it does have the new column in it - or if there was some reason it didn't replace the table (like you have it open in whatever that window is, so it can't replace it), then after that step, issue: print(conn.lastlog()) to see what happend on that step; could be an error about not replacing the table cuz you have it open. You can use saslog() to see the entire log for the session, but lastlog() is nice right after a given method to just see the log from that method.
Tom
I was viewing the Sasdata in EnterpriseGuide. You are correct, that by having opened it in EnterpriseGuide (even if its not actively open), it results in a file lock, so the write fails.
Question, is there a reason NOT to just check every log message for "ERROR:" and throw an exception if that is found? Are there too many cases where that can be misleading\false positive? I would have expected that if SAS ran into an error, it would have surfaced that in Python, without having to search the log.
Also, were you able to replicate the problem from my example above? I could paste a complete code example, but will take a while to be able to put it all together.
yeah, just blindly looking for 'ERROR:' in the log and assuming whatever all code I ran failed isn't a viable solution. That's why providing a nice lastlog() method, where you can see the just log from any method (like what you get from submit()) allows you to check for anything you might consider an error, or obviously a real error like in this case.
The last questions, about replicating the problem above, I'm not sure I understand. If you don't have that file locked, yourself, then yes, it rewrites that dataset with the column. That worked as expected, as I didn't have the data set locked, such that it wasn't rewritten. This is what I got trying that out:
>>> dfi = df.groupby('Sex').agg('count')
>>> dfi
Name Age Height Weight
Sex
F 9 9 9 9
M 10 10 10 10
>>> dfi.shape
(2, 4)
>>> sdi = sas.df2sd(dfi)
>>> sdi.columnInfo()
Member Num Variable Type Len Pos
0 WORK._DF 2 Age Num 8 8
1 WORK._DF 3 Height Num 8 16
2 WORK._DF 1 Name Num 8 0
3 WORK._DF 4 Weight Num 8 24
>>> sdi.head()
Name Age Height Weight
0 9 9 9 9
1 10 10 10 10
>>> dfir = df.groupby('Sex').agg('count').reset_index()
>>> dfir
Sex Name Age Height Weight
0 F 9 9 9 9
1 M 10 10 10 10
>>> sdi
Libref = WORK
Table = _df
Dsopts = {}
Results = Pandas
>>> sdir = sas.df2sd(dfir)
>>> sdi
Libref = WORK
Table = _df
Dsopts = {}
Results = Pandas
>>> sdir
Libref = WORK
Table = _df
Dsopts = {}
Results = Pandas
>>> sdi.head()
Sex Name Age Height Weight
0 F 9 9 9 9
1 M 10 10 10 10
>>> sdir.head()
Sex Name Age Height Weight
0 F 9 9 9 9
1 M 10 10 10 10
>>>
One idea: if errors detected, maybe write out a warning message that errors were detected in the log and encouraging them to review the log? Add a "suppress_error_warnings" for those who don't want to get warned? I think my confusion was that by default, I have no idea that I encountered errors without manually searching the log; I expected that if SAS ran into an error, it would follow typical Python behavior and throw an exception.
I'll provide a full code example shortly illustrating my other issue about indices not getting saved correctly.
Note that in example below, I am assuming you have a working SasConnection object named "conn" and a libname called "persfold", and have imported pandas as pd.
source_data=pd.read_csv("https://raw.githubusercontent.com/codeforamerica/ohana-api/master/data/sample-csv/contacts.csv")
conn.dataframe2sasdata(source_data,"source_data_no_index","persfold",)
print(source_data)
source_data.set_index("name",inplace=True)
print(source_data)
conn.dataframe2sasdata(source_data,"source_data_index","persfold",)
See screenshots below, where the "source_data_index" dataset in sas is missing the "name" column that we set as the index
Or, alternatively, read the data back into python and look at the columns
data_no_index=conn.sasdata2dataframe("source_data_no_index","persfold")
data_with_index=conn.sasdata2dataframe("source_data_index","persfold")
print(data_no_index.columns.to_list())
print(data_with_index.columns.to_list())
Results in
['id', 'location_id', 'organization_id', 'service_id', 'name', 'title', 'email', 'department']
['id', 'location_id', 'organization_id', 'service_id', 'title', 'email', 'department']
Where you can see the name column has now been dropped.
Does that make it clear? Happy to set up a call if that would be easier.
And I think you can see from your example, that when you write the dataframe to sas, it drops the sex column\index. I would have expected that to carry over to SAS, unless I explicitly told it to drop the index. Similar to how in pandas, by default it writes the indices, unless you tell it not to.
Also realized my columns example from above isn't quite right; need to use the .index method to show that the index has been dropped. My bad, still new to open source work and submitting good issues :) Can see from below, that both data_no_index and data_with_index have the same index, which does not match the original dataframe.
print(source_data.index) #note that this is after we set name==index
data_no_index=conn.sasdata2dataframe("source_data_no_index","persfold")
data_with_index=conn.sasdata2dataframe("source_data_index","persfold")
print(data_no_index.index)
print(data_with_index.index)
Hey @aaronsmith1234, sorry, I'm heads down on other stuff here today, but trying to work on this concurrently. So, I'm sorry if I may have miss something. Can I sum up what all things I think we're now talking about and that wat we can reference them specifically (not in any order).
1) data frame index doesn't come over to SAS a s a column
2) data frame index doesn't come over to SAS as a index (not sure you said this, but seems another question)
3) I don't look for the word 'ERROR' in the log for every method and print() a message.
4) df2sd() doesn't overwrite an existing data set.
I think these are the things being address in this issue? Correct, or any I missing any?
Now, which ones are still on the table?
I think 4 is done, but I'm not sure you're agreeing on that. I may be missing something in the thread about this one.
1 and 2 also I think 'done'. That's how it works and I don't see a valid way to change the behavior. Again, open for further debate. I am not a dataframe expert, so if there is a way that this can actually work, in all cases, where it makes sense, then I'm happy to look into it. Always looking to make saspy better, but from what I've seen, I'm not seeing this make sense. What pandas thinks is an index (row label) and how SAS uses indexes doesn't match up enough. Also, given this is a python program, and one line makes the 'index' be a real column(s), while another makes it be the row label, that seems easy to do what you want in your python code.
3, well, I can't call that 'done' in my mind, since this will always be a debate. When looking at one simple case, it's easy to just say 'look for ERROR in the log and throw an error (or message)'. But when looking at the overall picture, this isn't that clear or obvious. There can be errors in the log which don't mean that what was being done failed. Just printing a message isn't acceptable, as that's not a programmatically accessible solution. Also, failing, in the case where it's a false positive, isn't acceptable either. The user program (this is a program, written in python), can check the log for anything they want, which is a programmatically accessible solution. So I will tend to go back to that being the answer. All that being said, there are cases where I do add checks into my multi step methods to try to catch predictable failures. So that's always something that might need to be done in another case. Though that's not just looking for the word ERROR in the log anyway, so that a bit more specific, for the case in hand.
Again, I'm not trying to dismiss anything you're questioning here. Just trying to keep on track and be able to focus on specifics. So, which of these (and are there any others?), are we still looking at? Do you believe you see behavior other that what I said each of those cases above do? Or is it you'd like to see the behavior change?
Thanks! Tom
First, definitely appreciate all the help here. Super good stuff!
Agree that #4 is closed; I think it relates directly to #3, in that if error reporting were more consistent with typical Python error reporting, I would have caught #4 without having of ever mentioned it. Going a little deeper here, I think the typical workflow in SAS (at least from what I've seen) is something like the following:
In Python (and I'm new here, so take it with a grain of salt!), very different. Take this code as an example
ans = 1 / 0
print("thing")
If you submit this, "thing" never gets printed, since Python immediately stops when it hits an unhandled exception. So when I'm programming in Python, even using in SASpy, that's the mindset I come in with; if my program hits an error that I'm not explicitly handling, the program will immediately stop execution and tell me that I received an error. I'm not having to go read a log file or system trace on every command I run; the default is assuming it worked unless I see an error.
I think the point you are making (and please correct me if I misstate it!) is that since under the hood you are calling SAS, the workflow should follow a SAS workflow; run command, look at log (perhaps programmatically as you point out), continue on as needed. I can see your point, but I also think its a bit dangerous that unless I'm explicitly checking every SAS command I submit for an error, I could run my process, see no errors in the Python script, but in reality every single SAS command failed, and by default SASpy doesn't alert me to that.
I agree that there really isn't a clean solution that works for every use case. One solution, as you recommend, is for me to write a checking step after each command that I run. And yes, that would work, but it's also only solving the problem for me. Another solution would be to add a few options to the package as to how to handle errors; then users can decide what makes the most sense for them. Some are primarily running interactive and might be fine with just warnings; others might be running in prod and want exceptions thrown for every SAS error. I think its a nuanced point that doesn't have a clean answer, but I think giving a default that at least raises the issue for awareness so people can decide how they want to handle it is a prudent route, instead of letting it go unseen except with manual introspection.
A few lines from the Zen of Python came to mind as I was thinking of this, and think they relate to the discussion
....
Explicit is better than implicit.
....
Errors should never pass silently.
Unless explicitly silenced.
I think my issue with #1 and #2 (which I agree are separate issues), I think my concern is that you can pass a dataframe to SAS via SASpy, and immediately retrieve it from SAS, and not get the same result. Note that I'm using result as a very loose term here; while it would be ideal if you could pass a dataframe to SAS and then retrieve that dataframe from SAS with the exact same indices, columns, etc, that's probably not possible (although maybe it is possible via some type of metadata you could add to a dataset, specifying what the dataset is indexed by in Python? I'm not up enough on my SAS chops to be able to say what might be possible here or whether its worth the effort, just throwing out ideas!).
The specific problem at hand that I was experiencing, that of row indices, can be solved by reset_index() (which also correctly unpacks multilevel row indices, at least in the examples I've done); yes, I could add that to every line when I'm writing a dataframe to SAS, but the next person that comes along is likely going to have the same issue (unless people aren't using indices as much as they should, since they are fantastic 😁). Given we know there is a potential issue here, some type of warning to the user alerting them of the problem would be valuable. Pandas does this now; if you do things that technically work but can trip you up, it will log warnings to the log (which you can explicitly silence if desired). There is something to be said about being explicit about what you are doing, which would lead to use reset_index, but I think if going that route, it should also warn the user that if there is a non-default range index, it will get lost when passing the dataframe to dataframe2sas method. I'm going to do a little bit of experimentation on multilevel column indices, will post some additional thoughts on those below when I have some time
Re: pandas example, see below. It's letting the person now that they are potentially going to get into trouble, although not actually stopping execution.
import pandas as pd
source_data=pd.read_csv("https://raw.githubusercontent.com/codeforamerica/ohana-api/master/data/sample-csv/contacts.csv")
source_data["location_id"][0]=0
Python has a pretty good ecosystem built out around this, and if desired, users can ignore warnings, or can programmatically catch them, or turn them into exceptions, a la:
import pandas as pd
import warnings
warnings.simplefilter("error")
source_data = pd.read_csv(
"https://raw.githubusercontent.com/codeforamerica/ohana-api/master/data/sample-csv/contacts.csv"
)
source_data["location_id"][0] = 0
print("never get here")
I think the biggest thing I was looking for is by default, warning the user that something may be going wrong here, or that they are potentially getting themselves into trouble (i.e. when copying a dataframe with a non-default range index to SAS).
And an example with multiindex on columns:
import pandas as pd
source_data=pd.read_csv("https://raw.githubusercontent.com/codeforamerica/ohana-api/master/data/sample-csv/contacts.csv")
source_data.head()
source_data["extraColumn1"]=1
source_data["extraColumn2"]=2
data_with_multi_row_and_column_indices=pd.pivot_table(source_data,values="location_id",index=["name","title"],columns=["extraColumn1","extraColumn2"])
conn.dataframe2sasdata(data_with_multi_row_and_column_indices,"multiindices","persfold",)
and a view of the data from EG: vs Python\Jupyter and a csv with the default options
I think that kind of gets to my point; with the default pandas methods, it outputs all the data by default, and I can recreate the original dataframe from the csv, as long as when I read it in I set the indices. But with the current SAS handling, by default, I can't recreate it, since it drops row indices. If I reset index and then save it to SAS, then I can recreate it, but its not consistent with the other to_* methods of pandas, where by default it will keep everything.
I haven't tried recreating the columns from the multiIndex example above, but I think it would be possible, just not enough time to try 😢
ok, that's a lot of stuff. So, 3 posts; let me try to address them. Sorry again, I'm up to my eyes in other stuff at the moment.
post 1 - mostly about number 3. No, I don't think that you have to look at the log after every method of saspy you use. That would be ridiculous, which is part of the reason I don't do that internally either :). This particular method, is one where the programmer should assess the outcome programmatically, or visually if you only ever run the code interactively. A note to stdout saying 'I saw the word ERROR in the log' isn't a solution, unless you actually really would never do anything in the code if you ran the whole program any way other than one step at a time interactively. As to
ans = 1 / 0
print("thing")
try not to pigeonhole python based upon observations from a higher level wrapper; there are multiple ways people can run python code, and again, that's one of my issues is about what may seem like a solution for one simple case doesn't solve all the cases, so a partial implementation isn't a good solution. When I run those two lines, the first gets an error and the second executes just fine:
>>> ans = 1 / 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> print("thing")
thing
>>>
I'm guessing you're running in a jupyter notebook? And you have both those in one cell? Even in that, if you put them in sperate cells and run them both, you'd see the print succeed.
For the data creation methods, there are too many possibilities in SAS for me to make a definitive assessment of success or failure. I have to defer that to the user. It can be unsuccessful ever if ERROR isn't in the log. It can be successful, even with ERROR in the log. When using this, knowing others may be accessing the same dataset you're trying to create/replace/update/delete, you really need to validate what happened, as it can vary any time you run it. Lots of ways to do this programmatically, not just parsing the log.
For posts 2/3 which are about indexes, I'm gonna need to investigate this a little further. Maybe even prototype something to really see if it's even feasible. Again, one simple case may work, but I want to make sure it can be consistent and deterministic.
Tom
No worries; this isn't something that is stopping me, just trying to help improve things! Feel free to drop the conversation as well if its just taking up time and not leading to a productive outcome for you, I don't have any skin in the game.
I'd be interested in your recommendation on how people should be using SASpy and checking for errors. Checking for errors manually sometimes seems like not an ideal solution; if I'm including some piece of code in my program, I have a reason for including it, and would like to know if it errors out (even if its some type of error that I end up suppressing because its not a real error\exception).
With running Python, the methods which I use are:
Also, if you have the code separated across multiple cells, and you do a run all cells, the exception stops execution
Running a script, where again, exception stops execution
Importing a package, where again, execution stops upon an error
The only scenarios (and again, I'm decently new to Python, so there could easily be cases I am missing\not aware of) I can see where you can submit those two lines and have both execute are if you submit them independently; I believe your example is that, where you submit the first line, see it error out, and then submit the second. The Jupyter notebook example of different cells is the same in mind; in both cases, you are submitting two separate batches of code, seeing one error out, and then proceeding to submit the second. And to me, that's fine; you explicitly see an error, decide you don't care, and continue submitting code anyway.
I agree that just writing something to stdout isn't a super solid solution; but I'd also argue that leaving all error checking up to the end user isn't a solid solution either. I like the Pandas approach noted above; use warnings by default when possibly dangerous things are happening (which I think an error in the log would be an example of, where it may not necessarily be an actual error, but it is indicative that something that has the potential for trouble is happening), and leave it up to users if they want to disable them, treat them as exceptions, or ignore them entirely. Its at least giving people a breadcrumb\nudge that they can then act on.
Have other users not brought up any concerns about error reporting? I'd be interested to know what other users have done in regards to this.
Hey @aaronsmith1234 , I got a handle on what I've been heads down on, so I have some free time today. I investigated pandas indexes (I've never really looked into them much before), and I think I can see a way to be able to automatically process a single column index, creating a variable out of it when transferring the dataframe to SAS. I expect I can simply set the index after moving a dataset to pandas, if the data set had a single index defined. I'll have to prototype all of this to really see, but I think I can acquire all of the info I need and process the index like a column on output. This will be a fairly extensive enhancement; lots of code. But I think it should be able to work. I don't see multi indexes working; I don't see the info I need with those.
Currently, it only takes 2 lines of user code to accomplish this, really. reset_index(), df2sd(..., outdsopts={'index':'variable'}), set_index() back and your df is as was. And for the other direction: sd2df(), set_index(). So, it's very simple on the python side, which is why I've never looked at trying to implement all of this in my code before. Will easily be 100 lines of code and a lot of testing and validation.
As for adding a option to adding an option to check the log for error and print a message, I have some ideas on that too. Might be able to do that in a common place, such that it's not so pervasive. If you have some time today, I'm happy to get on a call and talk about all of this, This thread's getting awfully long and spreading out on many topics :) A conversation can get us farther quicker.
Thanks! Tom
Really enjoyed the conversation!
I've reworked my prototype of the check for ERROR: in the log and print a message and set an attribute (so you can programmatically check it), to use the warnings module, like you suggested. This does seem a better way to do this than printing, as you can control it in various ways. I'm not clear yet on how to programmatically check it. Do you have an example of that? I could leave my attribute in there if that's easier.
Seems the default is to only show a given warning the first time, though you can set that too. I've tried that, and setting it to throw error, specifically instead of print the warning, and that worked. So, I think I'll go this route for these new enhancements. I may go back and see if other messages would be better to work into this too. Going forward, new ones will work this way. Like the index case I'll see about adding, like we talked about. Great suggestion on your part!
>>> sas.list_tables('x')
/opt/tom/github/saspy/saspy/sasioiom.py:976: UserWarning: Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem
warnings.warn("Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem")
[('3 BAD NAME', 'DATA'), ('A', 'DATA'), ('AF', 'DATA'), ('CAFEMENU', 'DATA'), ('CARS', 'DATA'), ('CARSFR', 'DATA'), ('CHARF', 'DATA'), ('PRDSALE', 'DATA'), ('PRDSALEFR', 'DATA'), ('PRDSALE_', 'DATA'), ('SCENSTATES', 'DATA'), ('SCOREDATA', 'DATA'), ('TOM', 'DATA'), ('TOM2', 'DATA'), ('TOM3', 'DATA'), ('V91', 'DATA'), ('_SUMMARY', 'DATA')]
>>> sas.list_tables('x')
[('3 BAD NAME', 'DATA'), ('A', 'DATA'), ('AF', 'DATA'), ('CAFEMENU', 'DATA'), ('CARS', 'DATA'), ('CARSFR', 'DATA'), ('CHARF', 'DATA'), ('PRDSALE', 'DATA'), ('PRDSALEFR', 'DATA'), ('PRDSALE_', 'DATA'), ('SCENSTATES', 'DATA'), ('SCOREDATA', 'DATA'), ('TOM', 'DATA'), ('TOM2', 'DATA'), ('TOM3', 'DATA'), ('V91', 'DATA'), ('_SUMMARY', 'DATA')]
>>> sas.list_tables('x')
[('3 BAD NAME', 'DATA'), ('A', 'DATA'), ('AF', 'DATA'), ('CAFEMENU', 'DATA'), ('CARS', 'DATA'), ('CARSFR', 'DATA'), ('CHARF', 'DATA'), ('PRDSALE', 'DATA'), ('PRDSALEFR', 'DATA'), ('PRDSALE_', 'DATA'), ('SCENSTATES', 'DATA'), ('SCOREDATA', 'DATA'), ('TOM', 'DATA'), ('TOM2', 'DATA'), ('TOM3', 'DATA'), ('V91', 'DATA'), ('_SUMMARY', 'DATA')]
>>>
>>>
>>>
>>> import warnings
>>> warnings.filterwarnings("always",module='saspy')
>>> sas.list_tables('x')
/opt/tom/github/saspy/saspy/sasioiom.py:976: UserWarning: Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem
warnings.warn("Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem")
[('3 BAD NAME', 'DATA'), ('A', 'DATA'), ('AF', 'DATA'), ('CAFEMENU', 'DATA'), ('CARS', 'DATA'), ('CARSFR', 'DATA'), ('CHARF', 'DATA'), ('PRDSALE', 'DATA'), ('PRDSALEFR', 'DATA'), ('PRDSALE_', 'DATA'), ('SCENSTATES', 'DATA'), ('SCOREDATA', 'DATA'), ('TOM', 'DATA'), ('TOM2', 'DATA'), ('TOM3', 'DATA'), ('V91', 'DATA'), ('_SUMMARY', 'DATA')]
>>> sas.list_tables('x')
/opt/tom/github/saspy/saspy/sasioiom.py:976: UserWarning: Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem
warnings.warn("Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem")
[('3 BAD NAME', 'DATA'), ('A', 'DATA'), ('AF', 'DATA'), ('CAFEMENU', 'DATA'), ('CARS', 'DATA'), ('CARSFR', 'DATA'), ('CHARF', 'DATA'), ('PRDSALE', 'DATA'), ('PRDSALEFR', 'DATA'), ('PRDSALE_', 'DATA'), ('SCENSTATES', 'DATA'), ('SCOREDATA', 'DATA'), ('TOM', 'DATA'), ('TOM2', 'DATA'), ('TOM3', 'DATA'), ('V91', 'DATA'), ('_SUMMARY', 'DATA')]
>>> sas.list_tables('x')
/opt/tom/github/saspy/saspy/sasioiom.py:976: UserWarning: Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem
warnings.warn("Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem")
[('3 BAD NAME', 'DATA'), ('A', 'DATA'), ('AF', 'DATA'), ('CAFEMENU', 'DATA'), ('CARS', 'DATA'), ('CARSFR', 'DATA'), ('CHARF', 'DATA'), ('PRDSALE', 'DATA'), ('PRDSALEFR', 'DATA'), ('PRDSALE_', 'DATA'), ('SCENSTATES', 'DATA'), ('SCOREDATA', 'DATA'), ('TOM', 'DATA'), ('TOM2', 'DATA'), ('TOM3', 'DATA'), ('V91', 'DATA'), ('_SUMMARY', 'DATA')]
>>>
>>>
>>>
>>> warnings.filterwarnings("error",module='saspy')
>>> sas.list_tables('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/tom/github/saspy/saspy/sasbase.py", line 2064, in list_tables
ll = self._io.submit(code, results='text')
File "/opt/tom/github/saspy/saspy/sasioiom.py", line 976, in submit
warnings.warn("Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem")
UserWarning: Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem
>>> sas.list_tables('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/tom/github/saspy/saspy/sasbase.py", line 2064, in list_tables
ll = self._io.submit(code, results='text')
File "/opt/tom/github/saspy/saspy/sasioiom.py", line 976, in submit
warnings.warn("Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem")
UserWarning: Noticed 'ERROR:' in LOG, you ought to take a look and see if there was a problem
>>>
>>>
>>> warnings.resetwarnings()
I enjoyed it as well, thanks for the time. Here is an example of what I meant; users can decide, based on a warning type, how they want it handled. There may be more extensive handling you can do, but haven't dug into it much more than that.
import warnings
warnings.warn("DeprecationWarningExample", DeprecationWarning)
warnings.simplefilter("error", RuntimeWarning)
warnings.warn(
"Deprecation Warning is still just a warning, no error", DeprecationWarning
)
warnings.warn(
"Runtime warning will kill the code based on filter above", RuntimeWarning
)
warnings.warn("This will never get executed due to error above", DeprecationWarning)
I also noticed that by using a context manager, you can suppress warnings, but give you a count of how many it generates. Couldn't find a way to get a count without doing the suppression though
import warnings
with warnings.catch_warnings(record=True) as warn_suppress:
warnings.warn("DeprecationWarningExample", DeprecationWarning)
warnings.warn("DeprecationWarningExample", DeprecationWarning)
warnings.warn("DeprecationWarningExample", DeprecationWarning)
print(len(warn_supress))
Well, I saw those statements in the doc, but can you use it in a sentence? :) Being able to set the filter and stuff is ok for whether you want to see these, or not, or just once, ...
But, for programmatically checking for it? How do you actually do this? Here's a simple case:
sas.check_error_log = False
df = sas.sd2df('PRDSALE', 'x')
if sas.check_error_log:
if df is not None:
print("Error in the log, but I got my dataframe")
else:
print("Didn't get a dataframe and there was an error in the log")
print(sas.lastlog())
raise
df.head()
# ...
# with warnings, how do you programmatically check?
df = sas.sd2df('PRDSALE', 'x')
if warnings.???:
if df is not None:
print("Error in the log, but I got my dataframe")
else:
print("Didn't get a dataframe and there was an error in the log")
print(sas.lastlog())
raise
df.head()
# ...
Hey @aaronsmith1234, I've been buried in other stuff here the past week or so, but I'm finally getting this out to github. I just pushed these changes for adding the check and message via warnings. So, they are at main now. You can pull and test out to see what you think. I also have some doc on this, which I will push out when I build a new release, but at the moment you can see it here https://github.com/sassoftware/saspy/blob/main/saspy/doc/source/advanced-topics.rst#automatic-checking-for-error-in-the-log-and-the-warnings-module It'll be in the real doc later when the new version is built.
If you have a chance, let me know what you think!
Thanks, Tom
Hey @tomweber-sas , I'm slammed with work, but if I get a chance I'll give it a shot and let you know!
No trouble. I know the feeling! :)
Hey, I added a warning for the case where the dataframe has other than a RangeIndex (the default):
>>> sd = sas.df2sd(df);sd.head()
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Acura MDX SUV Asia All 36945 33337 3.5 6 265 17 23 4451 106 189
1 Acura RSX Type S 2dr Sedan Asia Front 23820 21761 2.0 4 200 24 31 2778 101 172
2 Acura TSX 4dr Sedan Asia Front 26990 24647 2.4 4 200 22 29 3230 105 183
3 Acura TL 4dr Sedan Asia Front 33195 30299 3.2 6 270 20 28 3575 108 186
4 Acura 3.5 RL 4dr Sedan Asia Front 43755 39014 3.5 6 225 18 24 3880 115 197
>>>
>>> sd = sas.df2sd(df.set_index('Make'));sd.head()
/opt/tom/github/saspy/saspy/sasiohttp.py:1238: UserWarning: Note that Indexes are not transferred over as columns. Only actual coulmns are transferred
warnings.warn("Note that Indexes are not transferred over as columns. Only actual coulmns are transferred")
Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 MDX SUV Asia All 36945 33337 3.5 6 265 17 23 4451 106 189
1 RSX Type S 2dr Sedan Asia Front 23820 21761 2.0 4 200 24 31 2778 101 172
2 TSX 4dr Sedan Asia Front 26990 24647 2.4 4 200 22 29 3230 105 183
3 TL 4dr Sedan Asia Front 33195 30299 3.2 6 270 20 28 3575 108 186
4 3.5 RL 4dr Sedan Asia Front 43755 39014 3.5 6 225 18 24 3880 115 197
>>>
>>> sd = sas.df2sd(df.set_index('Make').reset_index());sd.head()
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Acura MDX SUV Asia All 36945 33337 3.5 6 265 17 23 4451 106 189
1 Acura RSX Type S 2dr Sedan Asia Front 23820 21761 2.0 4 200 24 31 2778 101 172
2 Acura TSX 4dr Sedan Asia Front 26990 24647 2.4 4 200 22 29 3230 105 183
3 Acura TL 4dr Sedan Asia Front 33195 30299 3.2 6 270 20 28 3575 108 186
4 Acura 3.5 RL 4dr Sedan Asia Front 43755 39014 3.5 6 225 18 24 3880 115 197
>>>
Just pushed that out to main also. I'll also add a note into the df2sd() doc about this too, so it's apparent there to start with.
I've just built a new version, V3.6.7, with all of these enhancements, so it's the current production version now. If you're good with all of this, we can close out this ticket. Obviously, if you need anything else, just let me know.!
Thanks! Tom
Hey @tomweber-sas , thanks! Yeah I'd say close it. Appreciate the good work!
Describe the bug This is possibly 2 issues, but am I including them in one bug report since there might be a connection. I wrote a dataframe with a string row index from Python to a Solaris SAS server, and the index was not written; i.e. the resultant SAS dataset didn't have the index. I expected the index to be carried over as well.
After calling reset_index on my dataframe to put the row index into its own column, I then called the dataframe2sasdata again, expecting the dataset to now have the index column, but it did not. I also tried outputting the file to a different sas dataset, and then it worked fine and had the extra column as expected. I even tried deleting the sas dataset on the server, and then re-running the dataframe2sasdata again, but the index was still missing
To Reproduce Steps to reproduce the behavior:
Expected behavior
Desktop (please complete the following information):