Open miranov25 opened 1 month ago
An array of 1, 2, 3, ... N-bit integers can be represented using a bit array.
Consider an array of M numbers, each with N bits. To represent them, you can use a Uint8Array
of size (N * M) / 8
.
How efficient is random access in such an array in JavaScript?
What is not mentioned here but should help improve memory consumption: -Bitmasks take up too much space - while not that much of an issue if there's more columns than histograms, reducing this by a factor of ~2 should be simple
This section explains the bitmask representation used for widget selection, which is ideal for inclusion in READMEdeveloper.md
or similar documentation.
We utilize a bitmask representation where each widget and point is represented by a bit.
Ideally, we should allocate an array of 32 bits for the product of N_points
and int(N_widgets/32)
:
(w0, w1, w2, ..., wN)
.bitmask[ipoints] = bitmask_accept
. Typically, all bits should be set to 1 if we use a logical AND operation.In the existing implementation by Marian, a different logic is used where the columns are bitmask per widget, sized at N_points/32
.
Npoints x 64 bits
(float).Npoints x 32 bits
(int), which needs to be converted to boolean. Additionally, this is added to the bitmasks.The advantage of Marian’s column-per-widget implementation lies in its update method, which modifies only the corresponding column of the bitmask when a single widget changes. In the row bitmask representation, the decision-making process is swift and does not require a special column. However, since this column is small, the memory savings are minimal (N_points x 32 bits).
Currently the selection is only logical && of selections from differnt widgets
In old PAW system configuration was defiend as a custom user defiend string - as shown above. For us it means that widgets needs IDs (as string) and in cusom logic form this IDS could be used .
refer to #96
Title: Automatically Generate Function Header for JavaScript Function Based on Used Variables
Description:
We have a list of all variables present in a table and a JavaScript function (as a string without the header) that uses a subset of these variables. Our goal is to automatically generate the function header for this JavaScript function, including only the variables that are actually used in the function.
This is needed as we would like to keep columns compressed and use (expand to) float representation only if needed.
Use Case:
Input:
columns
).functionString
).Output:
Proposed Solution:
We will implement an algorithm to achieve this by iterating through the list of variables and checking if each variable is present in the function string. If a variable is found in the function string, it will be added to the function header.
Pseudocode:
function generateFunctionHeader(columns, functionString):
Initialize an empty list `columnArgs`
for each column in columns:
if column is present in functionString:
add column to `columnArgs`
Create the function header string `header`
header = "function XXX(" + join(columnArgs, ", ") + ") {"
return header
JavaScript Code:
function generateFunctionHeader(columns, functionString) {
let columnArgs = [];
columns.forEach(column => {
if (functionString.includes(column)) {
columnArgs.push(column);
}
});
let header = "function XXX(" + columnArgs.join(", ") + ") {";
return header;
}
// Example usage:
let columns = ["v0", "v1", "v2", "v3", "v4"];
let functionString = "let sum = v0 + v1; let product = v1 * v3; return sum + product;";
let functionHeader = generateFunctionHeader(columns, functionString);
console.log(functionHeader); // Output: "function XXX(v0, v1, v3) {"
Allow the data source to handle a large number of columns.
The current limitation is in memory capacity, not CPU processing power.
Transition to using smaller float representations to reduce memory usage significantly. Currently, floats are represented in 64 bits, but with compression, we can achieve 8-bit or 16-bit quantization, which represents a memory reduction factor of 8 or 4.
A potential drawback is that transformations (such as selections and custom functions) may be slower. We need to quantify how CPU time scales with the number of rows and the number of columns.
10^6-10^7 crows * 10^2 columns -> 0.1-1 GBY in case of 1Byte representation - or 6-64 GBy with Float64 representation
To evaluate the use of compressed float representations for data manipulation, we should compare CPU usage across different types of operations:
val_i * step
.ordered_lookup(i)
."When I look at performance, there is quite a big difference between the browser and Node.js; in some code pairs, one is faster in the browser and the other in Node.js. Therefore, I believe it is more realistic to perform benchmarks in the browser and restrict Node.js to just unit tests."
Marian, could you please provide some examples? Are the differences a matter of percentage points or orders of magnitude?
I suggest we maintain CPU benchmarks for the numerical parts of the code in Node.js as well, since it is easier to automate and compare. For regression testing, this will be very important. We should soon move more towards WebAssembly, where I expect the results will be more predictable.
Based on the observation that performance varies significantly between the browser and Node.js environments for different code snippets, the suggestion to focus benchmarking in the browser and limit Node.js to unit testing appears reasonable. Here's a more detailed approach to implementing this strategy:
Isolate Performance-Sensitive Code: Identify parts of the codebase that are performance-sensitive and may behave differently in a browser compared to Node.js. This includes DOM manipulations, graphical operations, or operations that rely heavily on browser APIs.
Conduct Benchmarks in the Browser: Use tools like Google Chrome’s Lighthouse or Mozilla Firefox’s Developer Tools to run performance benchmarks directly in the browser. This method provides a more accurate measurement of how the code performs in its intended environment.
Set Up Automated Browser Tests: Implement automated browser testing frameworks such as Puppeteer or Selenium. These tools help automate benchmarks and integrate them into your continuous integration (CI) pipeline, regularly monitoring performance as part of your development process.
Utilize Node.js for Unit Testing: Since Node.js may not accurately replicate the browser environment for all types of code, it should primarily be used for unit testing backend functionalities or non-UI logic that does not heavily depend on browser capabilities. Frameworks like Mocha, Jest, or Jasmine are suitable for these tasks.
Performance Regression Testing: Regularly compare new performance measurements against previous benchmarks to identify any regressions or improvements. This can be integrated into your CI/CD pipeline to ensure continuous performance evaluation.
Documentation and Reporting: Maintain detailed records of performance tests and their results. Utilize this data to generate reports for further analysis and to guide optimization efforts.
Feedback Loop: Use insights from both browser benchmarks and Node.js tests to inform development decisions, focusing on optimizing areas where performance bottlenecks are identified.
To facilitate data transfer between the server and client, user-defined lossy and lossless compression is employed.
The techniques discussed below can reduce the data volume on the client. The impact on the speed of subsequent data manipulation should be evaluated:
It is unclear how swiftly specific functionalities can be implemented in JavaScript, the potential slowdown in code performance, and the complexity it might introduce. A subset of these features should be implemented promptly, while the rest could likely be addressed using WebAssembly.