mirsinarm / STAT662

code for STAT 662, from fall of 2014
0 stars 2 forks source link

Adding and deleting columns in big.data.frame #4

Open mirsinarm opened 10 years ago

rosebrewin commented 10 years ago

Yep I'm working on this. In terms of deleting columns, I'm essentially writing a bit of code to do negative indexing with [], so will eventually need to go in your (Miranda's) code for extraction.

mirsinarm commented 10 years ago

The extraction code actually works now ... now that I'm not confused about which file is being synced to github and which one I'm actually working on. So you should be able to just pull the code and then reload the package.

I'm starting to think about this too, because I'm trying to handle cases where someone tries to input data of a different class into a column of the big.data.frame. It seems like the backing files have their own class (big.matrix or big.char) and then the items inside those backing files can be of multiple classes (integer/numeric/etc. for big.matrix, character for big.char).

On Sat, Nov 22, 2014 at 8:12 PM, rosebrewin notifications@github.com wrote:

Yep I'm working on this. In terms of deleting columns, I'm essentially writing a bit of code to do negative indexing with [], so will eventually need to go in your (Miranda's) code for extraction.

— Reply to this email directly or view it on GitHub https://github.com/mirsinarm/STAT662/issues/4#issuecomment-64102221.

rosebrewin commented 10 years ago

Yep that's what I've come across.

For extraction, are you returning a normal data frame? I think that's what Jay's code started with anyway.

For removing a column, you're presumably still going to be left with a big.data.frame, in which case maybe this should be a separate function?

For getting the classes, I recommend looking up the info.txt file. So far I don't have a clue how you'd know where that was without remembering the location. Part of me is wondering whether location should be an attribute stored in .desc so that it isn't lost?

mirsinarm commented 10 years ago

The $-extraction returns a vector of type whatever the contents are. So it will return a vector of integers, etc. The same is true with x[, 2], which returns a vector of the contents of column 2.

Hm, so are you saying that we could have some function like drop[x, 2] which would remove the second column of the big.data.frame? I like that idea, it would make it more explicit than adding and dropping columns of a big.data.frame is more significant than in a normal data.frame. But then it would make big.data.frame not behave like a normal data.frame ...

As for the location problem, there's the problem of when you explicitly say the location vs. when you leave it blank (in which case, I think it sets the location to NULL??? See line 93.). But you can always access the pointer to the object with x@data$columnName@address (where x and columnName are the only variables), so if you can somehow convert that pointer to a file path ...

rosebrewin commented 10 years ago

Yes, we don't want negative indexing to return a completely different class of object to positive indexing. I think it would be best if negative indexing returned a normal data frame, and then I'll write a new function which is used to drop columns.

I think I've got round the info.txt document because I realised all the information is in the desc slot. I'll write a function and maybe have an option to 'physically' replace the old one, which will just delete the old file and put a new one in its place, otherwise it will essentially be creating a new big.data.frame in a new location.

mirsinarm commented 10 years ago

You've probably already gotten to this point, but this will probably work IF we can somehow get the location ... so I think you're right that having the file location stored somewhere would be very useful.

newnames <- c(names(y), 'name.x') newdim <- c(nrow(y), ncol(y)+ncol(x)) z <- new('big.data.frame', desc=list(dim=newdim, classes=c(y@desc$classes, class(x[])), maxchar=y@desc$maxchar, names=newnames), data=vector(mode="list", length=length(newnames)))

z@data[[1]] <- bigmemory::attach.big.matrix( paste(newnames[1], ".desc", sep=""), path=location) # repeat for each column

This would just create a new object in R that uses the same columns as the previous big.data.frame, plus the new column(s).

On Sat, Nov 22, 2014 at 9:17 PM, Rose Brewin notifications@github.com wrote:

Yes, we don't want negative indexing to return a completely different class of object to positive indexing. I think it would be best if negative indexing returned a normal data frame, and then I'll write a new function which is used to drop columns.

I think I've got round the info.txt document because I realised all the information is in the desc slot. I'll write a function and maybe have an option to 'physically' replace the old one, which will just delete the old file and put a new one in its place, otherwise it will essentially be creating a new big.data.frame in a new location.

— Reply to this email directly or view it on GitHub https://github.com/mirsinarm/STAT662/issues/4#issuecomment-64103697.

rosebrewin commented 10 years ago

Ok I think it's now a question of whether we want this to be working in R, or in the backing file.

Either we pass in a big.data.frame object and get a new big.data.frame object which exists just in R, or we can pass in a location and through R, edit the backing files.

Which one of these (or both) do you think is useful?