sassoftware / R-swat

The SAS Scripting Wrapper for Analytics Transfer (SWAT) package is the R client to SAS Cloud Analytic Services (CAS). It allows users to execute CAS actions and process the results all from R.
Other
50 stars 22 forks source link

rbind.casTable not binding 2 CAStables #21

Closed pinduzera closed 4 years ago

pinduzera commented 4 years ago

rbind.casTable not binding 2 CASTables, one is supposed to be updated that because I didn't want to drop the table to update it.

I'm trying to bind 2 CASTables but it keeps saying that 2 objects aren't CAStables even though I'm sure that they are.

class(nf_ctbl)
'CASTable'
class(ctbl)
'CASTable'

rbind.casTable(ctbl, nf_ctbl) 

Error in rbind.casTable(ctbl, nf_ctbl): This function must take two or more casTables
Traceback:

1. rbind.casTable(ctbl, nf_ctbl)
2. stop("This function must take two or more casTables")
are.
tomweber-sas commented 4 years ago

Hey, I have to look into this further and see why it happens to be like this. The rbind() function doesn't do anything but return this error. There is an rbind2() function though, which does appear to implement this functionality. Can you try using rbind2 with those tables and see if that works as expected?

Thanks, Tom

tomweber-sas commented 4 years ago

ok, I've looked into this some. I think that rbind and cbind are S3 methods, while rbind2 and cbind2 are S4 methods, and depending upon the objects you pass, the S3 methods are supposed to call the S4 methods, if the parms are S4 (which CAStables are), or if that rbind fails. In fact, when I do the following, it works as expected, driving the rbind2 method in swat under the covers, even though I issued the rbind function: BTW, 'it works for me' isn't an answer, but it's what I see, so we now need to figure out what's different in your environment than in mine, I think.

> t1 = as.casTable(caz, iris)
NOTE: Cloud Analytic Services made the uploaded file available as table IRIS in caslib CASUSERHDFS(sastpw).
> t1
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
[...]
149          6.2         3.4          5.4         2.3  virginica
150          5.9         3.0          5.1         1.8  virginica
> x = rbind(t1, t1)
> x
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
[...]
299          6.2         3.4          5.4         2.3  virginica
300          5.9         3.0          5.1         1.8  virginica
>

Also, I issued the class(t1) and what I get is a little different than what you show too. Still a CAStable, but the output is a little different:

> class(t1)
[1] "CASTable"
attr(,"package")
[1] "swat"
>

So, what I'm seeing is what I think should be happening, but what you're seeing isn't, clearly. Can you show what versions of R, swat, and CAS you are running with? Maybe that will provide some insight?

Thanks! Tom

tomweber-sas commented 4 years ago

The other thing would be to also show how/where you created these castables. Maybe that's useful too. Thanks again, Tom

pinduzera commented 4 years ago

They were all Promoted on Public library.

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)

Matrix products: default
BLAS/LAPACK: /opt/anaconda3/lib/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.7.4 readr_1.3.1     dplyr_0.8.1     swat_1.4.0     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       magrittr_1.5     hms_0.5.2        tidyselect_0.2.5
 [5] uuid_0.1-2       R6_2.4.0         rlang_0.4.1      httr_1.4.0      
 [9] stringr_1.4.0    tools_3.5.1      htmltools_0.3.6  digest_0.6.19   
[13] assertthat_0.2.1 tibble_2.1.1     crayon_1.3.4     IRdisplay_0.7.0 
[17] purrr_0.3.2      repr_1.0.1       base64enc_0.1-3  vctrs_0.2.0     
[21] curl_3.3         IRkernel_1.0.1   zeallot_0.1.0    glue_1.3.1      
[25] evaluate_0.13    stringi_1.4.3    pbdZMQ_0.3-3     compiler_3.5.1  
[29] pillar_1.4.0     backports_1.1.4  jsonlite_1.6     pkgconfig_2.0.2 

But when I try to run rbind2 i get the following:

ERROR: The table _rbindxyVLY384I in caslib CASUSER(sasdemo) must have at least one variable.
ERROR: The table fraud_update in caslib CASUSER(sasdemo) does not exist.
ERROR: The table fraud_new in caslib CASUSER(sasdemo) does not exist.
NOTE: Duplicate messages output by DATA step:
ERROR: The table _rbindxyVLY384I in caslib CASUSER(sasdemo) must have at least one variable.  (occurred 10 times)
ERROR: The action stopped due to errors.
Error in swat::check_for_cas_errors(res): 
Error message(s) found in CAS action results:
runSasCode(x@conn, code = code)
ERROR: The table _rbindxyVLY384I in caslib CASUSER(sasdemo) must have at least one variable.
ERROR: The table fraud_update in caslib CASUSER(sasdemo) does not exist.
ERROR: The table fraud_new in caslib CASUSER(sasdemo) does not exist.
NOTE: Duplicate messages output by DATA step:
ERROR: The table _rbindxyVLY384I in caslib CASUSER(sasdemo) must have at least one variable.  (occurred 10 times)
ERROR: The action stopped due to errors.

Traceback:

1. rbind2(ctbl, nf_ctbl)
2. rbind2(ctbl, nf_ctbl)
3. runSasCode(x@conn, code = code)
4. swat::check_for_cas_errors(res)
5. stop(paste("\nError message(s) found in CAS action results:", 
 .     deparse(sys.call(-1)), msgs, sep = "\n"))
tomweber-sas commented 4 years ago

Oh, wait, you're explicitly issuing rbind.castable(). Sorry, I got looking at the code and that took me down this path. Can you just try using rbind() as in my example above. The fact that you are passing in CAStables, will cause R to call our implementation, and that is why what I did worked as expected. Sorry I didn't see that to start with.

pinduzera commented 4 years ago

But the weird thing I don't get about it is why it's looking on causer since the table is explicitly on Public.

rbind(ctbl, nf_ctbl) 

ERROR: The table fraud_update in caslib CASUSER(sasdemo) does not exist.
ERROR: The table fraud_new in caslib CASUSER(sasdemo) does not exist.
ERROR: The table _rbindO3qKY7RCe in caslib CASUSER(sasdemo) must have at least one variable.
NOTE: Duplicate messages output by DATA step:
ERROR: The table _rbindO3qKY7RCe in caslib CASUSER(sasdemo) must have at least one variable.  (occurred 10 times)
ERROR: The action stopped due to errors.
Error in swat::check_for_cas_errors(res): 
Error message(s) found in CAS action results:
runSasCode(x@conn, code = code)
ERROR: The table fraud_update in caslib CASUSER(sasdemo) does not exist.
ERROR: The table fraud_new in caslib CASUSER(sasdemo) does not exist.
ERROR: The table _rbindO3qKY7RCe in caslib CASUSER(sasdemo) must have at least one variable.
NOTE: Duplicate messages output by DATA step:
ERROR: The table _rbindO3qKY7RCe in caslib CASUSER(sasdemo) must have at least one variable.  (occurred 10 times)
ERROR: The action stopped due to errors.

Traceback:

1. rbind(ctbl, nf_ctbl)
2. rbind(ctbl, nf_ctbl)
3. rbind(deparse.level, ...)
4. rbind2(argl[[i]], r)
5. rbind2(argl[[i]], r)
6. runSasCode(x@conn, code = code)
7. swat::check_for_cas_errors(res)
8. stop(paste("\nError message(s) found in CAS action results:", 
 .     deparse(sys.call(-1)), msgs, sep = "\n"))

Table Info:


str(nf_ctbl)

Formal class 'CASTable' [package "swat"] with 14 slots
  ..@ conn                :Reference class 'CAS' [package "swat"] with 15 fields
  .. ..$ sw_connection:Formal class '_p_CASConnection' [package "swat"] with 1 slot
  .. .. .. ..@ ref:Formal class 'externalptr' [package ""] with 0 slots
 list()
  .. ..$ sw_error     :Formal class '_p_CASError' [package "swat"] with 1 slot
  .. .. .. ..@ ref:Formal class 'externalptr' [package ""] with 0 slots
 list()
  .. ..$ soptions     : chr ""
  .. ..$ hostname     : chr "localhost"
  .. ..$ port         : int 5570
  .. ..$ protocol     : chr "cas"
  .. ..$ username     : chr "sasdemo"
  .. ..$ session      : chr "2121802d-a5c7-294b-9c43-510757e2ee0c"
  .. ..$ performance  :List of 10
  .. .. ..$ elapsedTime      : num 0.00662
  .. .. ..$ cpuUserTime      : num 0.002
  .. .. ..$ cpuSystemTime    : num 0.01
  .. .. ..$ systemTotalMemory: int -2106642432
  .. .. ..$ systemNodes      : int 1
  .. .. ..$ systemCores      : int 10
  .. .. ..$ memory           : int 6145984
  .. .. ..$ memoryOS         : int 15298560
  .. .. ..$ memorySystem     : int 0
  .. .. ..$ memoryQuota      : int 31408128
  .. ..$ severity     : int 2
  .. ..$ statusCode   : int 2620049
  .. ..$ reason       : chr "abort"
  .. ..$ status       : chr "Action stopped due to client interrupt request."
  .. ..$ messages     :List of 6
  .. .. ..$ : chr "ERROR: The table fraud_update in caslib CASUSER(sasdemo) does not exist."
  .. .. ..$ : chr "ERROR: The table fraud_new in caslib CASUSER(sasdemo) does not exist."
  .. .. ..$ : chr "ERROR: The table _rbindO3qKY7RCe in caslib CASUSER(sasdemo) must have at least one variable."
  .. .. ..$ : chr "NOTE: Duplicate messages output by DATA step:"
  .. .. ..$ : chr "ERROR: The table _rbindO3qKY7RCe in caslib CASUSER(sasdemo) must have at least one variable.  (occurred 10 times)"
  .. .. ..$ : chr "ERROR: The action stopped due to errors."
  .. ..$ events       : list()
  .. ..and 25 methods, of which 11 are  possibly relevant:
  .. ..  close, copy#envRefClass, disableDataMessages, enableDataMessages,
  .. ..  fork, getone, initialize, invoke, retrieve, show#envRefClass, upload
  ..@ tname               : chr "fraud_new"
  ..@ caslib              : chr "Public"
  ..@ where               : chr ""
  ..@ orderby             : list()
  ..@ groupby             : list()
  ..@ gbmode              : chr ""
  ..@ computedOnDemand    : logi FALSE
  ..@ computedVars        : chr ""
  ..@ computedVarsProgram : chr ""
  ..@ XcomputedVarsProgram: chr ""
  ..@ XcomputedVars       : chr ""
  ..@ names               : chr [1:39] "V1" "V2" "V3" "V4" ...
  ..@ compcomp            : logi FALSE
tomweber-sas commented 4 years ago

ok, so one problem figured out; calling it with just rbind() 'works'. But, I see that in the code that is being generated to run in CAS, which is data step code to accomplish the task, it doesn't generate the caslib.tablename, only tablename. That then would be why (I expect) you end up with these errors. Those errors are from the datastep, so the function is being driven and running. Looks like the code gen in this method needs to be fixed so it used the caslib and the tablename.

Let me look into that and see if I can get that fixed.

Since you're right here on this, there is a quick hack you could try, just to see if this works. So your table is really public.fraud_new whereas our code is effectively ending up with casuser.fraud_new, by not specifying the caslib.

Can you, just to test this, change the table name attribute on your castables and see if it runs? Not to do this for real, but until I fix the code to do it right. Just change @tname = 'public.fraud_new' like in the following where I'm doing that, but I get the revers error, as my table was in casuser, so it isn't found in public (that I typed wrong)

> t1 = as.casTable(caz, iris)
NOTE: Cloud Analytic Services made the uploaded file available as table IRIS in caslib CASUSERHDFS(sastpw).
>
> x = rbind(t1, t1)
> t1@tname
[1] "IRIS"
> t1@tname = 'publis.IRIS'
> t1@tname
[1] "publis.IRIS"
> z = rbind(t1, t1)
ERROR: Table 'publis.IRIS' could not be loaded.
ERROR: Failure opening table 'publis.IRIS': A table could not be loaded.
ERROR: The action stopped due to errors.
ERROR: The caslib 'PUBLIS' does not exist in this session.
ERROR: The caslib 'PUBLIS' does not exist in this session.
ERROR: The table _rbind7PWmxlUWZ in caslib CASUSERHDFS(sastpw) must have at least one variable.
ERROR: The action stopped due to errors.
Error in swat::check_for_cas_errors(res) :
Error message(s) found in CAS action results:
runSasCode(x@conn, code = code)
ERROR: The caslib 'PUBLIS' does not exist in this session.
ERROR: The caslib 'PUBLIS' does not exist in this session.
ERROR: The table _rbind7PWmxlUWZ in caslib CASUSERHDFS(sastpw) must have at least one variable.
ERROR: The action stopped due to errors.
>

Assuming this then works for you, that means the fix I'll be working on will address this. Change the @tname back though, or none of the other methods that don't have this bug will work right!. Just a one time hack to see if this will solve this on your end.

The error you got were a little different, though I have to suspect it's the missing caslib, but I'd like to see that that's the problem you're hitting and not 'the next thing'.

Thanks! Tom

tomweber-sas commented 4 years ago

Well, you can hold off on that hack attempt. I don't believe data step in CAS supports that syntax. Let me dig into this and figure out the right way to make this work and I'll give you something real to try. Thanks! Tom

tomweber-sas commented 4 years ago

Ok, so syntax in CAS isn't the same as in SAS. The caslib is a dataset option so the syntax is different. The code you have, for my example, generates this:

   code          = "data _cbindE3FSmhI13; 
set IRIS IRIS;
 run;" (string)

and now, after fixing it to generate the caslib too, so you can actually get at your tables, it's:

   code          = "data _cbindE3FSmhI13; 
set IRIS(caslib='CASUSERHDFS(sastpw)') IRIS(caslib='CASUSERHDFS(sastpw)');
 run;" (string)

This does then beg the next question. Where do you expect the new table to go? The new CAStable object you have in R refers to the new table created from combining the other two. But, there's no way on that function to say where you want it to be. The code is leaving off any kind of caslib on the new table, so it goes in the default (casuser or whatever that is). The other alternative is to use the caslib of (the first?) table so it gets created there? Is that what you would expect? Or in the default place? Should it generate the following? And, of course what if the two table are in different caslibs? Just use the first? That seems reasonable. So I guess either no caslib (default like it's doing), or use the caslib of the first table would be the choices.

   code          = "data _cbindE3FSmhI13(caslib='CASUSERHDFS(sastpw)');
 set IRIS(caslib='CASUSERHDFS(sastpw)') IRIS(caslib='CASUSERHDFS(sastpw)');
 run;" (string)

What do you think?

Thanks, Tom

tomweber-sas commented 4 years ago

I'm thinking that using the caslib of the first table would probably be the expected behavior. It's kind of appending the second table to the first, even though it's really creating a new table, so it seems the new table should be in the same place as table1. With R dataframes, there's only one scope of the tables, so this isn't a question, which is why there's no way to pass that info into the function. With CAS though, it's a different ball game, so I think the right behavior is to create the new table where the first table lives.

That work for you? I do have this coded up and can push it out to master.

Thanks, Tom

pinduzera commented 4 years ago

I was expecting to be able to update the table1 without Dropping it (manually) or having to load everything to R, just inside the CAS server and replace the original table1 (or add data to it). The problem I see is that when I use the rbind() i cannot specify the output, as you pointed out. Now I don't know if it's supposed to be an issue.

tomweber-sas commented 4 years ago

Yeah, I gotcha. So, rbind is an R function and it does what it does, which is creat a new dataframe by appending the two frames you give it. So the original 2 are still the same and now there's a 3rd which is the combined rows. Also, the signature of the function, doesn't have an 'options' so you can tell it to behave differently. So, rbind for CAStables, can only do what it does for R tables (dataframes).

That doesn't mean, however, that we can/t have a function for you that does what you want. We can add any functions to swat, that CAS can do, that aren't already R defined functions. So, assuming you can actually append to an existing CAS table (I'll have to investigate, as I don't use CAS, or I would know off the top of my head), there's no reason we can define a swat function to do the append. There my already be an action you can call now that does that, again, I'll have to go look and let you know.

I'll push this fix however, since it's not right; not using the caslib to refer to the table. At leas rbind will now work as it's defined to do. And I'll see about how to do an append.

Oh, and just to be clear, this rbind does all happen in CAS. None of the data had to come to R for this. A CAStable, is just a reference object to data in CAS. A CASdataframe is data in R, which is an extended for of a regular R dataframe. So the things that operate on CAStables, do so in CAS, and ones that operate on CASdataframes happen in R.

Thanks, Tom

kesmit13 commented 4 years ago

You can use the datastep.runcode action with an append=yes option to do what you want. It's not an R function, but you can do this right now (assuming you are using SAS Viya 3.4).

data cas_original_table(append=yes);
  set cas_extra_table;
run;
kesmit13 commented 4 years ago

Of course, you'll want to add in your caslib= specifications as well.

data cas_master_table(caslib='xxx' append=yes);
  set cas_delta_table(caslib='yyy');
run;
tomweber-sas commented 4 years ago

Yep, thanks Kevin! I just tracked that down too. So, using my earlier rbind example adding iris to iris, you can see this append working with the following code. The 'doc' I found was here, BTW: https://communities.sas.com/t5/SAS-Communities-Library/CAS-answers-to-4-common-data-manipulation-tasks-Part-1-APPEND/ta-p/571979

> library('swat')
SWAT 1.4.0.9000
> caz <- CAS('rdcgrdc', 22005)
NOTE: Connecting to CAS and generating CAS action functions for loaded
      action sets...
NOTE: To generate the functions with signatures (for tab completion), set
      options(cas.gen.function.sig=TRUE).
> t1 = as.casTable(caz, iris)
NOTE: Cloud Analytic Services made the uploaded file available as table IRIS in caslib CASUSERHDFS(sastpw).
> t2 = as.casTable(caz, iris, casOut='a2')
NOTE: Cloud Analytic Services made the uploaded file available as table A2 in caslib CASUSERHDFS(sastpw).
>
> nrow(t1)
[1] 150
> nrow(t2)
[1] 150
>
> cas.dataStep.runCode(caz, code="data IRIS(caslib='CASUSERHDFS(sastpw)' append=yes) ; set A2(caslib='CASUSERHDFS(sastpw)'); run;")
$InputCasTables
               casLib Name Rows Columns
1 CASUSERHDFS(sastpw)   A2  150       5

$OutputCasTables
               casLib Name Rows Columns Append Promoted
1 CASUSERHDFS(sastpw) IRIS  300       5    150        N

>
> nrow(t1)
[1] 300
> nrow(t2)
[1] 150
>
pinduzera commented 4 years ago

I was about to say that I had found the answer and realized that I didn't check the github, it was exactly what I did and worked flawlessly for what I had in mind.

I think I may close the issue now, thank you!

tomweber-sas commented 4 years ago

That's good news! Great minds think alike, huh :)