Open JoannaNawalanySAS opened 3 years ago
There are some differences between the REST interface and the binary interface in CAS that prevented all of the extended data types from working, so some were normalized to strings. There are a couple of ways to work around it depending on what version of the server you are on and what protocol you are using.
If you are using the binary protocol, you can use the use_addtable=True
option on the read_excel
method. This uses a different technique to upload data which supports all data types.
If you have a more recent version of the CAS server which includes informats in the table.loadTable
importoptions={vars=...}
parameter, you can specify an importoptions=
parameter to read_excel
indicating the format of the date/time column. The informat values are described here https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htm.
Thank you for your answer. I've tried your tips, but still I'm struggling with the problem.
As the action conn.builtins.history() shows importOptions I assume I have enough new version of CAS to use that parameter. I've tried like this (as I'm not sure if it is an CAS option or read_excel option)
Both didn't work.
I've tried to use use_addtable=True option, same, wrong, results: storm_damage = conn.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage", casout=dict(name="STORM_DAMAGE", replace=True, caslib="Public"), use_addtable=True)
I remembered working on a related issue recently (7eeac93332c5ed50c12004f1cc1f8fcb06f7b898). I forgot that you need to specify date_format
on read_excel
to match the informat specified in importoptions
. However, that change hasn't gone into an official release yet. You could try copying the one file into you installation if possible until another release goes out.
I've copied the file you've mentioned to my server, but I'm note sure where exactly put those options.
Is the code below correct? storm_damage = conn.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage", casout=dict(name="STORM_DAMAGE", replace=True, caslib="Public"), importoptions={'Date':'datetime'}, date_format={'Date':'mmddyy10.'})
The date_format
is passed to the to_csv
method of the underlying pandas DataFrame
, so it uses Python's date formats (see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html). If you are going for MMDDYY, then the Python format string would be '%m%d%y'.
How do I use the importoptions option in read_excel? It seems not working at all, even for string types, not only for dates. Do I have it set correctly here:
storm_damage = conn.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage", casout=dict(name="STORM_DAMAGE", replace=True, caslib="Public"), importoptions={'Date':'datetime'}, date_format={'Date':'%m/%d/%Y.'})
You are missing the vars=
level: importoptions={'vars': {'Date': 'datetime'}}
.
I've tried this (including specyfing cost as varchar to see if it is a problem with date values or maby with other types too):
storm_damage = conn.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage", casout=dict(name="STORM_DAMAGE", replace=True, caslib="Public"), importoptions={'vars':{'Date': 'datetime', "Cost":"varchar"}, 'date_format':{'Date':'%m/%d/%Y.'}}) storm_damage.dtypes
These are the results: Event varchar Date varchar Summary varchar Cost double
There is still the same problem.
I believe this is what you want.
importoptions={'vars':{'Date':{'informat': 'datetime', 'type':'datetime'}, 'Cost':{'type':'varchar'}}}
This is not working eather (I've inserted you sugestion and andded "date_format") :(
storm_damage = conn.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage",
casout=dict(name="STORM_DAMAGE", replace=True, caslib="Public"),
importoptions={'vars':{'Date':{'informat': 'datetime', 'type':'datetime', 'date_format':'%m/%d/%Y.'}, 'Cost':{'type':'varchar'}}})
storm_damage.dtypes
This is the result (even Cost is still a double): Event varchar Date varchar Summary varchar Cost double
@JoannaNawalanySAS It doesn't appear to be working the way that I remember. However, there is a workaround for now. You can do this in two steps. First, read the excel file into a DataFrame, then upload the DataFrame. That's all that would be happening behind the scenes anyway.
import pandas as pd
data = pd.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage")
storm_damage = conn.upload_frame(data,
date_format='%d%b%Y:00:00:00', # <= this indicates how dates should be written to the intermediate CSV
importoptions={
'vars': {
'Date': {'informat': 'datetime', # <= this indicates how CAS should read the column in the CSV
'type': 'datetime'},
'Cost': {'type': 'varchar'}
}
}, casout=dict(...))
I'll have to look into the read_*
methods to see how to make this work with them as well.
Thank you, I didn't thought about it as almost equally resources consuming solution. Importing to DataFrame and then uploading to CAS works fine!
Just one think: ERROR: Parameter 'date_format' is not recognized.
storm_damage = conn.upload_frame(data, importoptions={ 'vars': { 'Date': {'informat': 'datetime', 'type': 'datetime'}, 'Cost': {'type': 'varchar'} }}, casout=dict(caslib="Public", name="Storm_Damage", replace=True))
read_excel from SWAT works different that read_excel from Pandas, wich makes me unable to read Date column as a Datetime value.
SWAT: storm_damage = conn.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage", casout=dict(name="STORM_DAMAGE", replace=True, caslib="Public")) It reads Date column as Varchar
Pandas: sd = pd.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage") It reads Date column as datetime64
I've tried explicity use converters and it is not helping:
SWAT: storm_damage = conn.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage", casout=dict(name="STORM_DAMAGE", replace=True, caslib="Public"), converters={'Date': pd.to_datetime})
Pandas: sd = pd.read_excel("D:\Workshop\VIYAPY\storm.xlsx", sheet_name="Storm_Damage", converters={'Date': pd.to_datetime})
Anything I do I see the action run is: NOTE: 20: action table.loadTable / path='_f_c15dc565_7ff582b29b88.csv', readAhead=true, importOptions={fileType='csv', vars={Cost={type='double'}, Date={type='varchar'}, Event={type='varchar'}, Summary={type='varchar'}}, locale='EN-us'}, resident=true, promote=false, caslib='_LIB_C15DC565_7FF582B29B88', casOut={name='STORM_DAMAGE', caslib='Public', replace=true}, singlePass=false, noSource=true;