Closed biojerm closed 3 years ago
Hey Jeremy, good catch. This is a regression due to the performance enhancement to the STDIO access method as part of the rework for the char column lengths enhancements in 3.6.0. Actually, it exposes a bug in the code gen, which just happens to not cause a failure (even though there the error), and you still get an empty dataset. This still happens to work with the other access methods, but I have a fix to make this correct for all. I had a check for the col len being 0 from the calculation, and that would then set it to 8; the default length for an empty char column in SAS. But, I'm not getting 0, I'm getting nan, and that's where that syntax error is coming from. That syntax error doesn't stop the datastep and fail, but rather it succeeds with an empty table. This is still happening in the other access methods, but with the fix in STDIO, it's now causing an error that didn't used to happen. I've just changed my check for not getting a length to account for nan, or 0 ... and then set it to the default 8 and the datastep now runs correctly with the correct signature, so you get the empty dataset with correctly defined, default, char col lengths.
I've pushed this to main, can you give it a try and see that it works correctly for you now too?
Thanks! Tom
Hey Tom,
The update makes it so that can write empty datasets and a saspy.dataset is returned, however, when I try to open an empty dataset that is written to disk (not WORK) I am getting this message. If there is data, I don't have an issue opening the file.
(I am using the SASUniveralViewer)
at the risk of being redundant here is the code I am running, i added some write out steps in the end:
import saspy
import pandas as pd
import logging
logging.basicConfig()
df = pd.DataFrame({"col1": ["abc", "≥", "def"]})
output_path = "/scratch/jlabarge"
sas = saspy.SASsession(cfgname="sas_u8")
print(sas)
sas_ds = sas.df2sd(df, "should_be_ok")
print("sas_ds type: ", type(sas_ds))
print("###########################################\n\n")
# print("The issue I am seeing is when the DataFrame is empty like below")
empty_df = pd.DataFrame(columns=["col1"])
empty_sas_ds = sas.df2sd(empty_df, "returns_none_not_empty_ds")
print("empty_sas_ds type: ", type(empty_sas_ds))
print("###########################################\n\n")
# print(sas.lastlog())
print("writing datasets to disk")
sas.saslib("empty", path=output_path)
written_empty_ds = sas.df2sd(empty_df, libref="empty", table="i_am_empty")
print("written_empty_ds type: ", type(written_empty_ds))
sas.saslib("empty", path=output_path)
written_empty_ds = sas.df2sd(df, libref="empty", table="i_am_have_data")
# print(sas.saslog())
One more random thing I ran into was that the libref cannot have underscores. Has that always been true and I never noticed?
Hey Jeremy, I've tried out that code, and it seems to work; as far as sas is concerned (saspy and sas itself). I don't have the universal viewer to test out; that works on these datasets before?
>>> sas.saslib("empty", path=output_path)
280
281 libname empty '/tmp/x' ;
NOTE: Library EMPTY does not exist.
282
>>> sas.saslib("empty", path=output_path)
284
285 libname empty '/tmp/x' ;
NOTE: Libref EMPTY was successfully assigned as follows:
Engine: V9
Physical Name: /tmp/x
286
>>> written_empty_ds = sas.df2sd(empty_df, libref="empty", table="i_am_empty")
>>>
>>> written_empty_ds
Libref = empty
Table = i_am_empty
Dsopts = {}
Results = Pandas
>>> written_empty_ds.head()
Empty DataFrame
Columns: [col1]
Index: []
>>>
>>> written_empty_ds.columnInfo()
Member Num Variable Type Len Pos
0 EMPTY.I_AM_EMPTY 1 col1 Char 8 0
>>>
sas
[...]
NOTE: SAS initialization used:
real time 1.07 seconds
cpu time 0.17 seconds
1? libname empty '/tmp/x' ;
NOTE: Libref EMPTY was successfully assigned as follows:
Engine: V9
Physical Name: /tmp/x
2? proc print data=empty.i_am_empty; run;
NOTE: No observations in data set EMPTY.I_AM_EMPTY.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.69 seconds
cpu time 0.34 seconds
3?
and underscores are valid in libref:
>>> sas.saslib("em_pty", path=output_path)
680
681 libname em_pty '/tmp/x' ;
NOTE: Libref EM_PTY refers to the same physical library as EMPTY.
NOTE: Libref EM_PTY was successfully assigned as follows:
Engine: V9
Physical Name: /tmp/x
682
>>> sas.saslib("_empty", path=output_path)
684
685 libname _empty '/tmp/x' ;
NOTE: Libref _EMPTY refers to the same physical library as EM_PTY.
NOTE: Libref _EMPTY was successfully assigned as follows:
Engine: V9
Physical Name: /tmp/x
686
>>>
interesting.
I will keep digging with the the unable to open problem.
With the underscores I am getting this:
7
88 libname who_is_empty '/scratch/jlabarge' ;
ERROR: who_is_empty is not a valid SAS name.
ERROR: Error in the LIBNAME statement.
89
The libref specified is not assigned in this SAS Session.
might be something with our sas install/config. I know we have other programs that have funny behaviors with underscores.
Oh, that's just because it's too long. only 8 chars for a libref. But, is the fix working ok for you? That error from the universal viewer seem to say it can't support a dataset with no rows. But the rest of saspy is working as expected with this empty data set?
Ahhh, did not know there was an 8 char limit on librefs.
Yeah the rest seems to be working as expected on the saspy/sas front. I can also confirm that the same BOF/EOF message pop us with the previous version of saspy. So it looks more like a limit of the UniversalViewer
I am ok to close the issue.
Cool man! Yes it all looks good to me. I have 2 other small fixes at main now too. I think I'll build a new version with these 3 fixes. Then you've got a clean prod version to install and use. I'll close this, and, I just built a new version : 3.6.1 Again, conda will take a day'ish to be built, but saspy repo and pypi have 3.6.1 now.
Thanks, Tom
Describe the bug With saspy 3.6 if an empty data frame is passed to the df2sd method
None
is returned instead of a dataset with no observations. In sas 3.5.3(The last version I had installed) a sas dataset with no observations was returned in the same scenario as above. In my use case, I am turning CSVs into sas7bdat files .csv>pd.DataFrame>.sas7bdat and some of the csv files have headers by no data rows. So I would think the 'correct' sas dataset would be an dataset with zerods.obs()
.To Reproduce
Expected behavior If a data frame with no rows is passed to df2ds I would expect a dataset to be created with the same columns and no headers
Screenshots