twosigma / beakerx

Beaker Extensions for Jupyter Notebook
http://BeakerX.com
Apache License 2.0
2.8k stars 381 forks source link

Work with TableDisplay columns #2789

Closed altavir closed 7 years ago

altavir commented 9 years ago

I found (by means of experiment and source code exploration) that TableDisplay is represented by the list of rows and does not have means to work with columns. It is not very convenient since one usually needs to make some operations with columns (for example select subset of columns from table). Since TableDisplay already have column designations, I think it would be better to make it a set of some Column objects with names and types. Also in this case it will be easy to redefine arithmetic operations for columns. The resulting table workflow could be like that:

table <<  table["a column"] + 4*table["b column"]*table[2]
table << new Column("new column name", <any Iterable here e. g.  Column or List>) 

def column = table["new column name"]
def value = column[4]

def newTable = table.select(["a column",1])

Also one could introduce an object for row, which stores column parameters as well as values.

Row row = table.row(5)

List<Row> rows = table.rows()

List<Row> filteredRows = table.rows().findAll{some Row condition closure}
scottdraves commented 9 years ago

Thanks, indeed, In Groovy we don't really have a class for tables. We have https://github.com/ssadedin/graxxia in the jars but I never got an good demo notebook for it, and it seemed matrix oriented more than column. Are there other good groovy dataframe classes we could use?

altavir commented 9 years ago

I haven't searched for one. In this case it is quite easy to write one myself. Will do it in first free time window. The only question is your autotranslation mechanism, which i don't understand in great detail. Is pandas table always translated to TableDisplay? Do I need to replace TableDisplay or can I just create new class to which we can add autotranslation later?

scottdraves commented 9 years ago

how about this? https://code.google.com/p/jlabgroovy/ https://github.com/twosigma/beaker-notebook/issues/347

altavir commented 9 years ago

I tried to work with this framework, but found that it is not maintained (no updates for 2 years) and definitely not ready for public use. There are currently no good groovy scientific frameworks (as far as I searched), which gives beaker an advantage. Though there are some pretty java frameworks like http://jas.freehep.org/jas3/ and http://jwork.org/dmelt/ which possibly could benefit from notebook integration. I am (quite slowly) working on my own project: http://www.inr.ru/~nozik/dataforge/. It currently have no GUI and I am thinking about using beaker as one of possible solutions.

As for tables, it is definitely easier to build it from scratch. I already started to work on it.

altavir commented 9 years ago

Hello again. I found an hour of free time and started to work with this table problem. That's what I've done so far: https://bitbucket.org/Altavir/groovytable/overview . It is of course far from perfect, but I think it is a way more convenient than current TableDislpay. Since row values are not stored but calculated there could be some performance issues with very-very large tables, but thy could probably be avoided by using rowIterator instead of rows. Also the whole system is currently not very thread safe, but it could be fixed.

I am not familiar with your autotranslation system so I don't know how to integrate with it, but it its probably not very hard.

Please notify me if you do like what I've done and if you want this work to continue here or in the project issue tracker.

scottdraves commented 9 years ago

I am definitely interested in this & will be happy to explain how integrate it (add a serializer). Storing as columns in the JVM is fine, as long as the JSON representation does not change. Any chance you could sign a CLA and contribute this upstream?

altavir commented 9 years ago

If you want a written signature in paper, than there could be a problem to send it from Russia. If the scan is sufficient, then of course I can do it. Your autotranslation feature is very similar to the metadata processing that I use in DataForge so I do understand the principle, but it will take some time to understand how it is implemented since beaker definitely lacks documentation. Also I am not sure I understand the repository structure. It seems like build.gradle file in the main directory has nothing to do with either plugins o core module.

scottdraves commented 9 years ago

A scan is fine.

The main place to connect types to autotranslation is https://github.com/twosigma/beaker-notebook/tree/master/plugin/jvm/src/main/java/com/twosigma/beaker/jvm/serialization

The top-level gradle file passes requests to the gradle files in the core & plugin/* subdirectories.

altavir commented 9 years ago

OK, I will return to it in a few days after one very urgent job.