svdhoog / FLAViz

FLAViz: Flexible Large-scale Agent Visualization Library
GNU General Public License v3.0
1 stars 4 forks source link

Performance #16

Open svdhoog opened 6 years ago

svdhoog commented 6 years ago

A couple of design decisions of FLAViz lead to performance issues.

1 Data conversion from db to h5. This works on a file-per-file basis and is highly parallisable by simply building in the launching of multiple sub-processes that run the same db_hdf5_v2.py script on a subset of the files. Currently only one core is used.

1b Processing of the per-set-and-run files set_*_run_*.h5 Since these files are being generated at the intermediary stage (translated from set_run.db files), the FLAViz routines could work on these files instead of on the more monolithic Agent.h5 files. If only a subset of data is required for some specific task, it is clearly more efficient to only use the files needed, rather than loading the entire data set into memory.

2 Plotting. Multiple plots are currently processed one by one. Each plot can be a separate sub-process, that retrieves data from the main data set once that has been read-in into main memory.

3 Transformations. If there are multiple tasks, each task could be on its own sub-process, allocated to a different core.

Testing performance

There have been some preliminary attempts to test the performance of the scripts. This is documented in the manual.

svaksha commented 6 years ago

Slicing & indexing the h5 file as numpy arrays with pandas for specific subset of the data will allow the user to plot on a number of subsets instead of the whole file, else try chunking.