Open lwlwlwlw opened 1 year ago
Hi @lwlwlwlw! Can you be more specific (perhaps with an example of what API you are expecting) about what kind of appending you are looking for? Are you looking for appending rows, appending columns, or some other concept of appending?
I'll also comment that the process of appending is complicated by the inherent structure of SBDF files. They are laid out as a sequence of table slices (consisting of a number of rows) that contain a sequence of column slices (consisting of all the values in one column in the rows covered by the containing table slice). To append a row involves rewriting (and growing) all the column slices in the last table slice; appending a column will have to rewrite all table slices (and would probably require rebalancing rows between slices since there is target number of values (rows x columns) in each table slice for performance reasons).
In general, it's probably easier to import the data from the file, make any modifications to the data that are desired, and then exporting.
@bbassett-tibco Thank you for your reply.
One of our customers wants to append data (rows) to existing sbdf file because the data is incremental.
Preferable something like this, ("append=True" option to indicate appending)
import spotfire.sbdf as sb df=data.frame(...) sb.export_data(df,"d:/tmp/file.sbdf", append=True)
Hello!
I'd like to second lwlwlwlw's request for an append mode to SBDF files and provide additional context for why this feature would be extremely valuable.
Many of my clients require processing and exporting of large amounts of data (often exceeding available RAM) from various file formats and SQL databases into SBDF files. Our typical workflow involves Python processes where we perform data cleaning and formatting before converting to SBDF. This approach ensures that the Spotfire project loads pre-processed, clean data, significantly improving load times and project performance.
However, I am facing challenges with the existing export_data
function, which seems to be designed primarily for in-memory pandas DataFrames. This becomes problematic when dealing with datasets that exceed available RAM.
Currently, my workaround is to split larger datasets into multiple SBDF files and Spotfire "concatenates" them as the project loads, but this increases loading time. This is particularly inefficient given the explanation provided earlier about the complexity of appending: "To append a row involves rewriting (and growing) all the column slices in the last table slice; appending a column will have to rewrite all table slices (and would probably require rebalancing rows between slices since there is target number of values (rows x columns) in each table slice for performance reasons)."
Given these challenges, I am wondering if you could provide guidance or consider implementing features that allow for more memory-efficient handling of large datasets during the export process. Specifically:
I understand that the SBDF file structure makes this complex, but any insights or potential solutions would be greatly appreciated. If full append functionality isn't feasible, are there alternative approaches or best practices you'd recommend for handling these large, pre-processed datasets more efficiently?
Thank you for your consideration of this feature request and any guidance you can provide.
OK, given @bschwartzjetrock's well written problem description, I'm beginning to think that a potential solution to this request would look like:
export_data
is probably not the right function to add this to. There is an impedance mismatch in this case:
1) export_data
's argument list is not set up for two SBDF filenames
2) export_data
allows for data of different shapes, while appending requires a specific shape for the data
We should add a new function (or pair), tentatively to be called append_rows
(and append_columns
, if we implement it).We can definitely investigate the concept further in a future release.
Feature requests: Would it be possible to add append mode to sbdf export (append data to an existing sbdf file)? Thank you.