Closed numericOverflow closed 6 years ago
Hard to say without more information, but I would consider creating a second table with the same schéma. Adding your data to that table and then appending the new table to the original
On Wed, Nov 15, 2017 at 2:08 PM numericOverflow notifications@github.com wrote:
I know TableSaw is intended for mass insertion of data, but I've got a situation where I have most of the data, but want to add a few rows during processing. My intention is to work records much like a financial ledger, and am converting code from a python proof-of-concept that used Pandas dataframes.
My rough algorithm:
- Load a batch of new data (hundreds to thousands of records)
- process data loop -- insert forecast records (1 record at a time) -- reanalyze w/new data to adjust size/amount/placement of next forecast record -- continue processing data loop until all forecasting calculations complete and data is "balanced"
So my question is what's the most efficient way to add small sets of rows to an existing Table? --Oversize the original table & update existing rows as desired? --Copy table using emptyCopy(1), update values & append new single-row table to original table? --Something else?
The section of the userguide on the wordpress site https://jtablesaw.wordpress.com/user-guide/tables/ related to adding/removing rows is blank, and I haven't been able to find much in the way of examples showing what I'm looking to do. There's lots of good examples on columnar work, but not much row-wise that I've been able to find. It's like I need an appendRow() function that took in a string/array/list/etc row and appended it to the table.
Seems like TableSaw is geared for an "insert-once, analyze-many" approach whereas my use case is an "insert-many, analyze-many" situation, so I want to be efficient in my approach. I like the flexibility & built-in analytics TableSaw has, so I wouldn't need to start from scratch with a custom approach.
Any strategy suggestions would be greatly appreciated!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/199, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRXgvOUs0gY8MZ_Nq4b0D9mJ_Zbi3PWks5s2zbFgaJpZM4Qfa28 .
OK, so what I'm doing seems very cumbersome. I'm aware I'm kind of misusing TableSaw, but it seems like there's got to be a better way to set individual values (Cells?) in the table:
//Create 3 arrays of length=1, one for each Column to be created below
List<String> Col1Vals = Arrays.asList("Col1Row1");
List<String> Col2Vals = Arrays.asList("Col2Row1");
double[] Col3Vals = new double[1];
Col3Vals[0] = 5.0;
//Create 3 columns, from each single value array created above
CategoryColumn C1 = new CategoryColumn("COL1",Col1Vals);
CategoryColumn C2 = new CategoryColumn("COL2",Col2Vals);
DoubleColumn C3 = new DoubleColumn("COL3",Col3Vals);
//Now build the table from all 3 columns
Table t = Table.create("TBL1",C1,C2,C3);
//Output summary info about the table we just created to prove init data was loaded
traceln(t.summary());
I've been poking around the API, and I can't seem to find a good way to set or update an individual row/column value. That brings me to some questions:
Thanks!
You first 6 lines could be condensed to three:
CategoryColumn C1 = new CategoryColumn("COL1", new String[] { "Col1Row1" });
CategoryColumn C2 = new CategoryColumn("COL2", new String[] { "Col2Row1" });
DoubleColumn C3 = new DoubleColumn("COL3", new double[] { 5.0 });
table.doubleColumn(0).set(2, 7.0);
Table.read().csv(exampleString, "tableName")
@numericOverflow I updated my answer for 4 to be simpler
@benmccann - Thanks for adding that CSV string function, it's much cleaner! I'll download the latest release and play with it.
i'm wondering if this can be closed. I think the behavior of Tablesaw is pretty good for these things:
Tablesaw is not good for inserting records in the middle of a table, or deleting them from the middle one-at-a-time. Deleting a batch of records is not too bad.
This is the nature of a column-oriented structure. Improving on that would be a bunch of work. We'd have to make a hybrid column/row-based structure like some of the more advanced big data stores.
@lwhite1 - I'd agree, this ticket can be closed. Your examples show the best possible (mis)use of TableSaw to do what I wanted. I wouldn't expect a major re-write to improve at this time, given the original true intention of TableSaw.
Thanks for the help & adding support for direct CSV string import.
I know TableSaw is intended for mass insertion of data, but I've got a situation where I have most of the data, but want to add a few rows during processing. My intention is to work records much like a financial ledger, and am converting code from a python proof-of-concept that used Pandas dataframes.
My rough algorithm:
So my question is what's the most efficient way to add small sets of rows to an existing Table?
--Oversize the original table & update existing rows as desired? --Copy table using emptyCopy(1), update values & append new single-row table to original table? --Something else?
The section of the userguide on the wordpress site related to adding/removing rows is blank, and I haven't been able to find much in the way of examples showing what I'm looking to do. There's lots of good examples on columnar work, but not much row-wise that I've been able to find. It's like I need an appendRow() function that took in a string/array/list/etc row and appended it to the table.
Seems like TableSaw is geared for an "insert-once, analyze-many" approach whereas my use case is an "insert-many, analyze-many" situation, so I want to be efficient in my approach. I like the flexibility & built-in analytics TableSaw has, so I wouldn't need to start from scratch with a custom approach.
Any strategy suggestions would be greatly appreciated!