sassoftware / saspy

A Python interface module to the SAS System. It works with Linux, Windows, and Mainframe SAS as well as with SAS in Viya.
https://sassoftware.github.io/saspy
Other
367 stars 149 forks source link

sasdata2dataframe time8. coverts to datetime64[ns] #490

Closed javier5109 closed 1 year ago

javier5109 commented 1 year ago

Describe the bug Pulling in Time8. column produces datetime96 with date in pandas, with current date as date

import saspy sas = saspy.SASsession(cfgname="httpsviya",user = racf, pw = sas_password) sas.saslib('reference1', path="path") sas.saslib('reference2', path = "path")

df = sas.sasdata2dataframe("table_name","reference1",dsopts = {'obs':500})

To Reproduce Steps to reproduce the behavior:

  1. connecting to sas using saspy
  2. downloading time8. column from table

Expected behavior I am unsure if we are suppose to get a string or numpy time dtype. I am just sure that the current date is not suppose to be added

Screenshots Here is a the time8 stamps to copy and paste.

16:01:24 11:35:11 11:37:25 16:07:33

Here is the time8 column that should be appearing image

Here is the output of the dates that I am getting image

Configuration information. Please provide the configuration you're trying to use (your sascfg_personal.py file) as well as what client system you are on and what kind of SAS deployment you're trying to connect to and where it's deployed (local to the client or remote). I am on an EC2. AWS instance, running 256G. Running python 3.8

Access Method = HTTP SAS Config name = httpsviya SAS Config file = /home/jovyan/envs/usage_modeling/lib/python3.8/site-packages/saspy/sascfg.py WORK Path = 'private' SAS Version = V.03.05M0P11112019 SASPy Version = 4.3.0 Teach me SAS = False Batch = False Results = Pandas SAS Session Encoding = utf-8 Python Encoding value = utf_8 SAS process Pid value = 1646474

Additional context I have attempted to use different encoding. SAS user system settings are utf-8.

tomweber-sas commented 1 year ago

Ok, I believe you're saying that having SAS variables with a format of time8. turn into datetime64 in the pandas dataframe. Yes, that's to be expected; pandas doesn't have date or time only data types to use, so time or date (and datetime of course) SAS formats all become pandas datetime64 datatype. Read through this doc: dates-times-and-datetimes-oh-my to see how these are handled each direction - if you want only dates or times in SAS, from pandas datetime64, there are options for that. Also a simple instruction or two to convert pandas datetime64 to Timestamp.date/time, which are Python types, not Pandas (numpy) types. So, you can manipulate those if you want.

Let me know if that addresses this or not. I think it will, but that depends on if I understand the problem correctly.

Thanks, Tom

javier5109 commented 1 year ago

I see so default behavior is for any sas date/time to be switched to datetime64. Got it. Is there any documentation on mappings for sas to python like the snowflake connector?

Link to snowflake connector mappings: https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#snowflake-to-pandas-data-mapping

tomweber-sas commented 1 year ago

Well, SAS only has 2 types; fixed length char or double. That's it. SAS has 'formats' which tell SAS to interpret a double as the number of seconds or days since a point in time; but the variable is still just a double. So I convert any numeric from pandas to a double. datetime64 to double w/ datetime format assigned. If you specify formats for only date or only time (like in that section of doc), then the double will be correct value and the formats will be assigned. Anything else is the str(x) where x is the object in the dataframe; so those are all strings -CHAR in SAS.

Going the other way, char variables are strings and numerics are a numeric type determined by pandas (float likely, but it may choose int if all values in the column have no decimals). And, for numeric variables, I first see if there are formats for data/time/datetime and convert those to pandas datetime64 instead of just numeric types.

tomweber-sas commented 1 year ago

Hey @javier5109 did that make sense? Really is kinda simple since SAS only had 2 types, and the idea of data/time/timestamp. One line of python can change the datetime64 to whatever you want it to be, if it isn't that, after you get the dataframe back. Supplying a format for only time or date, and I'll do the right thing going df->sd, if you only want time/date in SAS.

Is there anything else I can do or can we close this? Are you able to get it how you want it to be?

Thanks, Tom

javier5109 commented 1 year ago

Got it. No you can mark it off as closed. It makes sense on how the transformations are going and like you said, its relative easy to change the datetime64 to time in pandas.

I am still thinking that adding a table, like the one sas has that maps it out to pandas dtypes, would help the documentation for saspy go well.

https://documentation.sas.com/doc/en/vdmmlcdc/8.1/ds2ref/n19bf2z7e9p646n0z224cokuj567.htm

tomweber-sas commented 1 year ago

Cool. Yeah, I was thinking the same thing. I will add that to the doc. I think that would be a good thing. Next week... Have a great weekend, Tom

tomweber-sas commented 1 year ago

Hey, I finally added the section of doc, in the advanced-topics section (where other sd2df and df2sd info was). https://sassoftware.github.io/saspy/advanced-topics.html#moving-data-between-python-and-sas-datatype-conversion Sorry that slipped my attention, but now it's there. I also rearranged the other sections in there so they follow this, which makes it all come together, hopefully. Any feedback is appreciated :) Thanks, Tom