The current file handling classes need improvement, particularly with regards to memory usage for large files.
This issue is more of a notepad to draft out the rewritten version before any code is touched.
Problems
Mesh Reading
The lazy iterators like .iter_nodes() or .iter_elements() are susceptible to other methods requiring file iteration, which upsets their own file seeking, skipping some or all of the items they were meant to yield.
The easiest solution here would be to move to actual iterator classes, rather than relying on the generators to maintain state. The iterators could just remember their file seek position and return there before continuing.
Retrieving a given object by ID is currently done by creating a sorted list of all objects and accessing it via index. This is good for speed, but the memory impact for very large files is quite high (especially as this stores the instances, not just the raw lines).
Since IDs are already required to be sorted and consecutive (i.e. without holes), dialling in on a given object via binary search starting on the first and last ID seems like a better option.
Mesh cards are currently parsed in-order.
Instead, it would be better to parse the entire file once, collect any "simple" (i.e. non-repeating) cards and evaluate them afterwards.
Mesh Writing
Mesh writing is currently performed by effectively adding elements to a list, which is then written to disk when the corresponding .write_*() method is called. This keeps the full list in memory until the writer method is called.
This memory impact can be reduced by clearing the list manually and repeatedly calling the writers, but it is an opaque, inelegant solution.
A cleaner option would be to have an internal cache of the currently written elements, which is then written to file in batches of a few thousand lines.
Considerations
The current implementation may be memory-intensive, but it is also reasonably fast. Care must be taken that saving memory does not come at a performance cost.
The new mesh reading system requires a single pass over the entire file to locate the beginning and end of blocks (elements, nodes, etc.).
This only works if the elements are homogenous and sorted, which might not be possible for all meshes. Whatever optimisation is done must be optional.
The current file handling classes need improvement, particularly with regards to memory usage for large files.
This issue is more of a notepad to draft out the rewritten version before any code is touched.
Problems
Mesh Reading
The lazy iterators like
.iter_nodes()
or.iter_elements()
are susceptible to other methods requiring file iteration, which upsets their own file seeking, skipping some or all of the items they were meant to yield.The easiest solution here would be to move to actual iterator classes, rather than relying on the generators to maintain state. The iterators could just remember their file seek position and return there before continuing.
Retrieving a given object by ID is currently done by creating a sorted list of all objects and accessing it via index. This is good for speed, but the memory impact for very large files is quite high (especially as this stores the instances, not just the raw lines).
Since IDs are already required to be sorted and consecutive (i.e. without holes), dialling in on a given object via binary search starting on the first and last ID seems like a better option.
Mesh cards are currently parsed in-order.
Instead, it would be better to parse the entire file once, collect any "simple" (i.e. non-repeating) cards and evaluate them afterwards.
Mesh Writing
Mesh writing is currently performed by effectively adding elements to a list, which is then written to disk when the corresponding
.write_*()
method is called. This keeps the full list in memory until the writer method is called.This memory impact can be reduced by clearing the list manually and repeatedly calling the writers, but it is an opaque, inelegant solution.
A cleaner option would be to have an internal cache of the currently written elements, which is then written to file in batches of a few thousand lines.
Considerations
The current implementation may be memory-intensive, but it is also reasonably fast. Care must be taken that saving memory does not come at a performance cost.
The new mesh reading system requires a single pass over the entire file to locate the beginning and end of blocks (elements, nodes, etc.).
This only works if the elements are homogenous and sorted, which might not be possible for all meshes. Whatever optimisation is done must be optional.