Closed sharon92 closed 2 years ago
Hello, thank you for the report!
This is an unwanted side effect of my attempts to reduce the memory impact of opening large meshes. Both Reader.iter_element()
and Reader.node(<index>)
look iterate over the file as needed, and it seems like there is some crosstalk between the file iterators.
As a workaround, you can use the Reader.elements
property instead of Reader.iter_elements()
; it creates a list of elements internally, so the Reader.node()
calls won't upset it:
with py2dm.Reader(netz) as mesh:
elements = []
for n, element in enumerate(mesh.elements):
elements += [[mesh.node(n).pos for n in element.nodes]]
print(n)
This of course defeats the purpose of not loading all the elements into memory at once, but that is more a sign of my current solution being suboptimal. π
The MESHNAME
card does not appear to be part of the 2DM specification, but some BASEMENT utilities, as well as a handful of MATLAB mesh plotting scripts do use it.
I have added support for this card in a way similar to your solution, with the name being available via the Reader.name
attribute if the card was found.
Thanks again for the heads-up - I have created a hotfix for this issue, but I'll likely take some time to clean up the card parsing strategy along with the file iterators very soon.
Thanks! Yeah my solution was to iterate over Reader.elements in the end.
Its still very efficient, numpy took about 12seconds to load the data (elements and nodes) compared to 8 seconds for reading the whole 2dm file with your library.
I am using py2dm to generate Water Level Raster using Barrycentric Interpolation on Trianglular Elements. To create a 0.5m spaced raster with 10k time 8k cells from 2dm mesh it takes currently 3.40 Minutes, hopefully I can boost the speed soon :)
Good to hear - I'll start drafting up a proper fix for some of the file IO issues, there is still a fair bit of performance left on the table here on the Py2DM side.
For your use-case, this means that Python must check (on average) half the list of nodes to find the correct one by ID. Okay, this is luckily not actually the case, I did use cached lists. Still, I am storing the finished node objects in memory, which is not ideal. I'll have to weigh speed and memory usage off more and clarify how to get either in the docs.
Needless to say, I rather it perform a binary search on the node list, it's sorted after all.
Let me know if you need help with running tests. I can test meshes containing 800k elements, and about 80 million Nodes.
Thank you for the offer - I gave the bottlenecks a bit more thought today and would love to get some more scaling numbers with your data. I was not aware that there were use-cases with large meshes containing that many nodes compared to the elements. My examples and testing data mostly consist of fairly uniform rivers and channels for which the number of elements is either identical to or higher than the number of nodes.
The nodes are ultimately what causes the headaches, since they are an unstructured list that must be accessed in a random as the elements are iterated over.
No promises on when that updated version will be ready, sadly - I have a lot of ideas I want to explore and not a lot of time. π
Sorry, I wrote incorrectly. You are right, its about 400k Nodes and 800k Elements. And I am trying to interpolate the Water Level results on 80 Million raster points of a 0.5m Raster Grid.
Maybe you can try passing FINDSTR command in command prompt using subprocess. https://docs.microsoft.com/de-de/windows-server/administration/windows-commands/findstr
command prompt: --> findstr /c:"ND 1 " hydro_as-2d.2dm
or
in python: --> subprocess.call(['findstr', '/c:"ND {} "'.format(NodeNumber),'hydro_as-2d.2dm'])
I am not exactly sure how fast this would be, but just an idea.
Ah, no worries - I fed a simple geometry into triangle with slightly upsetting constraints, so I have myself a very unreasonable 20 million cell mesh for testing purposes now. π
Using external tools always brings some OS-specific caveats; I would like to stick to pure Python if possible. Parts of the code that are limited by Python's lower speed can always be moved to a C extension later.
I am already requiring node and element IDs to be sorted and without gaps (I will add a script that does this conversion in case not all programs adhere to this), so moving to some form of binary search should be a good mix of memory usage and access times.
There are a few extra steps involved since line lengths vary even in the same file, and there is also a point where the file seek times mean that it is more efficient to read ~1'000 lines and search them sequentially, but I am confident that there is a good balance there.
I will keep an all-in-memory option available for small meshes that need to be accessed often and quickly, though - for your use-case, that might still be the best option if you have sufficient memory and overall interpolation speed is the end goal.
Closing as the issues discussed are no longer applicable since v0.2
Hello, I recently installed your library as its very useful for me to load elements and nodes from 2dm files into python.
I have encountered 2 problems so far:
Problem 1:
Reader.iter_elements() loops only through the first element:
This prints 0, there are 800k elements in the file.
Problem 2:
Error to read the 2dm file, The second line of a 2dm file can start with "MESHNAME", However the "read.py" -> Reader -> _check_header function, looks for "NUM_MATERIALS_PER_ELEM" in the second line.
My Temporaray Solution:
Line 181-186 in read.py