kbuzard / labs

MIT License
1 stars 5 forks source link

Cluster program, moving code from SAS to R #3

Open JorgeValde opened 6 years ago

JorgeValde commented 6 years ago

This is the code using the data.table package.

I first begin by opening the working directory.

Then installing the package "data.table" and the "bit.64", the former to avoid troubles with the length of the cells.

setwd(file.path("//hd.ad.syr.edu","01","e88665","Documents","Desktop","Backup SAS","from SAS to R")) install.packages("data.table") install.packages('bit64') library(data.table)

Read the data from database matching

matching<- file.path("SASmatching.csv") matching<-fread("SASmatching.csv")

Read the data from database citations

citations<- file.path("SAScitations.csv") citations<- fread("SAScitations.csv")

Read the data from database possiblenclass

possiblenclass <- file.path("SASpossiblenclass.csv") possiblenclass <- fread("SASpossiblenclass.csv") View(possiblenclass)

First Iteration

I replace the control numbers in the variable x_control with the uniform distribution

possiblenclass$x_control<-runif(nrow(possiblenclass)) View(possiblenclass$x_control)

After I assign random values I need to sort the dataset possiblenclass

possiblenclass[order(nclass,x_control)] View(possiblenclass)

Next step is to create the dataset matching1

matching1<-matching[,.(cited, patent)] View(matching1) matching1$binvar<- 1

Line 233 call the dataset citation and sort it.

citations[order(cited,patent)]

I create the data.table matching2 (double check the merge, but I get the same result as in the SAS code)

matching2<- merge(citations,matching1, by = .EACHI, all.x = TRUE) View(matching2)

Delete the rows with binvar=1.

I'm using the function na.omit in reverse, this means that it will return all the rows which have missing values.

matching2<-na.omit(matching2, cols = "binvar",invert=TRUE)

Sort matching2 by nclass

matching2[order(nclass)]

Now I have to merge matching2 with possiblenclass

I hit a wall when trying to do this merge. If you don't mind we can talk about it tomorrow.

kbuzard commented 6 years ago

I have a couple of questions about this code (out of my ignorance about data tables), and I don't think we resolved the issue with the merge when we met yesterday. We knew when we ran out of time yesterday that there were many open questions, so just let me know when you're ready to meet again.

JorgeValde commented 6 years ago

Would you be OK to meet tomorrow?. I haven't look at the code since we talked, yesterday I worked in finding documentation for the invnum variable, I think I found something. And today I'm planning on looking at the gradualism project, so I can bring some questions if we have time. I'm free any day of the week.

kbuzard commented 6 years ago

How about 2pm tomorrow (Wednesday)?

On Tue, Jun 26, 2018 at 10:46 AM JorgeValdebenito notifications@github.com wrote:

Would you be OK to meet tomorrow?. I haven't look at the code since we talked, yesterday I worked in finding documentation for the invnum variable, I think I found something. And today I'm planning on looking at the gradualism project, so I can bring some questions if we have time. I'm free any day of the week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kbuzard/labs/pull/3#issuecomment-400336449, or mute the thread https://github.com/notifications/unsubscribe-auth/AHu0QBXO8-SwwdWgL7v51NfNP1j5Zf4xks5uAklWgaJpZM4U0GpG .

JorgeValde commented 6 years ago

That's perfect I'll be there

Jorge

Get Outlook for Androidhttps://aka.ms/ghei36


From: Kristy Buzard notifications@github.com Sent: Tuesday, June 26, 2018 1:15:45 PM To: kbuzard/labs Cc: Jorge Arturo Valdebenito; Author Subject: Re: [kbuzard/labs] Cluster program, moving code from SAS to R (#3)

How about 2pm tomorrow (Wednesday)?

On Tue, Jun 26, 2018 at 10:46 AM JorgeValdebenito notifications@github.com wrote:

Would you be OK to meet tomorrow?. I haven't look at the code since we talked, yesterday I worked in finding documentation for the invnum variable, I think I found something. And today I'm planning on looking at the gradualism project, so I can bring some questions if we have time. I'm free any day of the week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kbuzard/labs/pull/3#issuecomment-400336449, or mute the thread https://github.com/notifications/unsubscribe-auth/AHu0QBXO8-SwwdWgL7v51NfNP1j5Zf4xks5uAklWgaJpZM4U0GpG .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/kbuzard/labs/pull/3#issuecomment-400394081, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AlyvEylFk6qhPxVVNbzfOJcCSCJ24yoeks5uAmxBgaJpZM4U0GpG.