sassoftware / saspy

A Python interface module to the SAS System. It works with Linux, Windows, and Mainframe SAS as well as with SAS in Viya.
https://sassoftware.github.io/saspy
Other
373 stars 150 forks source link

SASPY Stat TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe' #390

Closed bergen288 closed 3 years ago

bergen288 commented 3 years ago

Describe the bug "saspy" is installed in Anaconda base environment on Windows Server 2016. SAS9.4M6 is on AIX7.2. saspyConnection is my Python class to connect to SAS9.4M6 on AIX. As you can see in the log at the bottom, the connection is successful. I am trying to use "cars" dataset in sashelp library to do ANOVA analysis. See code below:

with saspyConnection() as conn:
    cars = conn.sasdata('cars', libref='sashelp')
    stat = conn.sasstat()
    stat_results = stat.reg(model='horsepower = Cylinders EngineSize',data=cars)
    print('SAS Stat/ANOVA Analysis Against Cars Data in SAS Help Library')
    print(stat_results.ANOVA)

Unfortunately, it failed with type error. What's wrong with it?

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Using SAS Config named: winiomlinux SAS Connection established. Subprocess id is 3124

SASPY Connection established: Access Method = IOM SAS Config name = winiomlinux SAS Config file = C:\Users\xzhang\sascfg_Windows.py WORK Path = /work/SAS_work11E201990110_jappsasapp02.onefiserv.net/SAS_work4D1101990110_jappsasapp02.onefiserv.net/ SAS Version = 9.04.01M6P11072018 SASPy Version = 3.7.2 Teach me SAS = False Batch = False Results = Pandas SAS Session Encoding = latin1 Python Encoding value = latin1 SAS process Pid value = 26804496

SAS Stat/ANOVA Analysis Against Cars Data in SAS Help Library Traceback (most recent call last): File "pandas_libs\parsers.pyx", line 1050, in pandas._libs.parsers.TextReader._convert_tokens TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "e:\Python_Projects\examples\saspy_stat.py", line 9, in print(stat_results.ANOVA) File "C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasresults.py", line 74, in getattr data = self._go_run_code(attr) File "C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasresults.py", line 111, in _go_run_code df = self.sas.sasdata2dataframe(attr, libref=lref) File "C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasbase.py", line 1685, in sasdata2dataframe df = self._io.sasdata2dataframe(table, libref, dsopts, method=method, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasioiom.py", line 1691, in sasdata2dataframe return self.sasdata2dataframeDISK(table, libref, dsopts, rowsep, colsep, File "C:\ProgramData\Anaconda3\lib\site-packages\saspy\sasioiom.py", line 2087, in sasdata2dataframeDISK df = pd.read_csv(sockout, index_col=idx_col, engine=eng, header=None, names=varlist, File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 610, in read_csv return _read(filepath_or_buffer, kwds) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 468, in _read return parser.read(nrows) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1057, in read index, columns, col_dict = self._engine.read(nrows) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 2061, in read data = self._reader.read(nrows) File "pandas_libs\parsers.pyx", line 756, in pandas._libs.parsers.TextReader.read File "pandas_libs\parsers.pyx", line 771, in pandas._libs.parsers.TextReader._read_low_memory File "pandas_libs\parsers.pyx", line 850, in pandas._libs.parsers.TextReader._read_rows File "pandas_libs\parsers.pyx", line 982, in pandas._libs.parsers.TextReader._convert_column_data File "pandas_libs\parsers.pyx", line 1056, in pandas._libs.parsers.TextReader._converttokens ValueError: could not convert string to float: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "e:\Python_Projects\examples\saspy_stat.py", line 9, in print(stat_results.ANOVA) File "E:\Python_Projects\Settings\xz_settings.py", line 621, in exit raise exc_type(excvalue) ValueError: could not convert string to float: ''

tomweber-sas commented 3 years ago

Well, that's the first time I've seen that. So it's a good news / bas news thing. Good first; I can reproduce that and I'm seeing what you're seeing. Bad; that's a problem I'll have to fix. I believe that's the case that has a special missing value which is being formatted to an '_' instead of a '.' (underscore not a dot). SAS has ~28ish missing values (all special floating point doubles that aren't real numbers). Unfortunately that means I'm going to have to address this one way or another. I need to research this a little to see how the best way it (in SAS or Pandas), and provide a fix for you.

Sorry about that! Tom

tomweber-sas commented 3 years ago

@bergen288, I pushed a fix to main for this. Can you give it a go and verify it fixes your case?

I'm run the following to prove this out on my end:

import saspy
sas = saspy.SASsession(results='text')
sas

sas.submitLST("""data a; format x2 datetime26.6;
                         x1=.; x2=.a; x3=.z; x4=._; c='A'; c2='Z'; c3='_' ; output;
                         x1=1; x2=99; x3=3;  x4=4 ; c='' ; c2='' ; c3=''  ; output;
                         x1=1; x2=.a; x3=3;  x4=4 ; c=' '; c2='.'; c3='  '; output;
                 proc print;run;
              """,
              results='text', method='logandlist')

sd = sas.sasdata('a')
sd.to_df()

with the following results on all 3 access methods

                                                           The SAS System                      Friday, July 16, 2021 09:06:00 PM   1

                                Obs                            x2    x1    x3    x4    c    c2    c3

                                 1                              A     .     Z     _    A    Z     _
                                 2      01JAN1960:00:01:39.000000     1     3     4
                                 3                              A     1     3     4         .

>>>
>>> sd = sas.sasdata('a')
>>> sd.to_df()
                   x2   x1   x3   x4    c   c2   c3
0                 NaT  NaN  NaN  NaN    A    Z    _
1 1960-01-01 00:01:39  1.0  3.0  4.0  NaN  NaN  NaN
2                 NaT  1.0  3.0  4.0  NaN    .  NaN
>>>
tomweber-sas commented 3 years ago

Also, here's your use case (though I'm doing in in line mode so I changed up the way to get the df a little):

>>> stat = sas.sasstat()
>>> res = stat.reg(model='horsepower = Cylinders EngineSize',data=cars)
dir(res)
>>> dir(res)
['ANOVA', 'COOKSDPLOT', 'DFBETASPANEL', 'DFFITSPLOT', 'DIAGNOSTICSPANEL', 'FITSTATISTICS', 'LOG', 'NOBS', 'OBSERVEDBYPREDICTED', 'PARAMETERESTIMATES', 'QQPLOT', 'RESIDUALBOXPLOT', 'RESIDUALBYPREDICTED', 'RESIDUALHISTOGRAM', 'RESIDUALPLOT', 'RFPLOT', 'RSTUDENTBYLEVERAGE', 'RSTUDENTBYPREDICTED']
>>> anova = sas.sasdata('anova','_reg0001')
>>> anova.head()

                                                           The SAS System                            17:18 Friday, July 16, 2021   3

               Obs    Source                       DF              SS              MS          FValue           ProbF

                1     Model                     2.000     1487803.732      743901.866         440.192           0.000
                2     Error                   423.000      714847.921        1689.948            _               _
                3     Corrected Total         425.000     2202651.653            _               _               _

>>>
>>> res.anova
<IPython.core.display.HTML object>
>>> anova.to_df()
            Source     DF            SS             MS      FValue          ProbF
0            Model    2.0  1.487804e+06  743901.866012  440.192215  4.296828e-104
1            Error  423.0  7.148479e+05    1689.947803         NaN            NaN
2  Corrected Total  425.0  2.202652e+06            NaN         NaN            NaN
>>>
bergen288 commented 3 years ago

I downloaded the newest saspy-main.zip file and re-install it. It looks like the fix is good. Below is my new output with the same Python code:

Using SAS Config named: winiomlinux
SAS Connection established. Subprocess id is 9880

SASPY Connection established: Access Method         = IOM
SAS Config name       = winiomlinux
SAS Config file       = C:\Users\xzhang\sascfg_Windows.py
WORK Path             = /work/SAS_workBBAB029001A8_jappsasapp02.onefiserv.net/SAS_work579E029001A8_jappsasapp02.onefiserv.net/
SAS Version           = 9.04.01M6P11072018
SASPy Version         = 3.7.2
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = latin1
Python Encoding value = latin1
SAS process Pid value = 42992040

SAS Stat/ANOVA Analysis Against Cars Data in SAS Help Library
            Source     DF         SS        MS  FValue  ProbF
0            Model   2.00 1487803.73 743901.87  440.19   0.00
1            Error 423.00  714847.92   1689.95     NaN    NaN
2  Corrected Total 425.00 2202651.65       NaN     NaN    NaN
SAS Connection terminated. Subprocess id was 9880

Thank you very much for quick fix, really appreciated.