leonhard-s / Py2DM

A Python module for reading and writing SMS 2DM mesh files.
MIT License
9 stars 0 forks source link

Elements not iterating #1

Closed sharon92 closed 2 years ago

sharon92 commented 4 years ago

Hello, I recently installed your library as its very useful for me to load elements and nodes from 2dm files into python.

I have encountered 2 problems so far:

Problem 1:

Reader.iter_elements() loops only through the first element:

netz    = r"HQ1000_48h\hydro_as-2d.2dm"

import time
t1=time.time()
with py2dm.Reader(netz) as mesh:
    elements = []
    for n,element in enumerate(mesh.iter_elements()):
        elements += [[mesh.node(n).pos for n in element.nodes]] 
        #next
    print(n)

This prints 0, there are 800k elements in the file.

Problem 2:

Error to read the 2dm file, The second line of a 2dm file can start with "MESHNAME", However the "read.py" -> Reader -> _check_header function, looks for "NUM_MATERIALS_PER_ELEM" in the second line.

My Temporaray Solution:

Line 181-186 in read.py

        if next_line(iterator) != 'MESH2D':
            raise MissingCardError('Missing required card MESH2D')
        line = next_line(iterator)
        mesh_name = 'MESHNAME'
        if line.startswith(mesh_name):
            line = next_line(iterator)
        nmpe_card = 'NUM_MATERIALS_PER_ELEM'
        if not line.startswith(nmpe_card):
            raise MissingCardError(f'Missing required card {nmpe_card}')
leonhard-s commented 4 years ago

Hello, thank you for the report!

Problem one

This is an unwanted side effect of my attempts to reduce the memory impact of opening large meshes. Both Reader.iter_element() and Reader.node(<index>) look iterate over the file as needed, and it seems like there is some crosstalk between the file iterators.

As a workaround, you can use the Reader.elements property instead of Reader.iter_elements(); it creates a list of elements internally, so the Reader.node() calls won't upset it:

with py2dm.Reader(netz) as mesh:
    elements = []
    for n, element in enumerate(mesh.elements):
        elements += [[mesh.node(n).pos for n in element.nodes]]
    print(n)

This of course defeats the purpose of not loading all the elements into memory at once, but that is more a sign of my current solution being suboptimal. πŸ˜…


Problem two

The MESHNAME card does not appear to be part of the 2DM specification, but some BASEMENT utilities, as well as a handful of MATLAB mesh plotting scripts do use it.

I have added support for this card in a way similar to your solution, with the name being available via the Reader.name attribute if the card was found.


Thanks again for the heads-up - I have created a hotfix for this issue, but I'll likely take some time to clean up the card parsing strategy along with the file iterators very soon.

sharon92 commented 4 years ago

Thanks! Yeah my solution was to iterate over Reader.elements in the end.

Its still very efficient, numpy took about 12seconds to load the data (elements and nodes) compared to 8 seconds for reading the whole 2dm file with your library.

I am using py2dm to generate Water Level Raster using Barrycentric Interpolation on Trianglular Elements. To create a 0.5m spaced raster with 10k time 8k cells from 2dm mesh it takes currently 3.40 Minutes, hopefully I can boost the speed soon :)

leonhard-s commented 4 years ago

Good to hear - I'll start drafting up a proper fix for some of the file IO issues, there is still a fair bit of performance left on the table here on the Py2DM side.

For your use-case, this means that Python must check (on average) half the list of nodes to find the correct one by ID. Okay, this is luckily not actually the case, I did use cached lists. Still, I am storing the finished node objects in memory, which is not ideal. I'll have to weigh speed and memory usage off more and clarify how to get either in the docs. Needless to say, I rather it perform a binary search on the node list, it's sorted after all.

sharon92 commented 4 years ago

Let me know if you need help with running tests. I can test meshes containing 800k elements, and about 80 million Nodes.

leonhard-s commented 4 years ago

Thank you for the offer - I gave the bottlenecks a bit more thought today and would love to get some more scaling numbers with your data. I was not aware that there were use-cases with large meshes containing that many nodes compared to the elements. My examples and testing data mostly consist of fairly uniform rivers and channels for which the number of elements is either identical to or higher than the number of nodes.

The nodes are ultimately what causes the headaches, since they are an unstructured list that must be accessed in a random as the elements are iterated over.

No promises on when that updated version will be ready, sadly - I have a lot of ideas I want to explore and not a lot of time. πŸ˜…

sharon92 commented 4 years ago

Sorry, I wrote incorrectly. You are right, its about 400k Nodes and 800k Elements. And I am trying to interpolate the Water Level results on 80 Million raster points of a 0.5m Raster Grid.

Maybe you can try passing FINDSTR command in command prompt using subprocess. https://docs.microsoft.com/de-de/windows-server/administration/windows-commands/findstr

command prompt: --> findstr /c:"ND 1 " hydro_as-2d.2dm

or

in python: --> subprocess.call(['findstr', '/c:"ND {} "'.format(NodeNumber),'hydro_as-2d.2dm'])

I am not exactly sure how fast this would be, but just an idea.

leonhard-s commented 4 years ago

Ah, no worries - I fed a simple geometry into triangle with slightly upsetting constraints, so I have myself a very unreasonable 20 million cell mesh for testing purposes now. πŸ˜„

Using external tools always brings some OS-specific caveats; I would like to stick to pure Python if possible. Parts of the code that are limited by Python's lower speed can always be moved to a C extension later.

I am already requiring node and element IDs to be sorted and without gaps (I will add a script that does this conversion in case not all programs adhere to this), so moving to some form of binary search should be a good mix of memory usage and access times.

There are a few extra steps involved since line lengths vary even in the same file, and there is also a point where the file seek times mean that it is more efficient to read ~1'000 lines and search them sequentially, but I am confident that there is a good balance there.

I will keep an all-in-memory option available for small meshes that need to be accessed often and quickly, though - for your use-case, that might still be the best option if you have sufficient memory and overall interpolation speed is the end goal.

leonhard-s commented 2 years ago

Closing as the issues discussed are no longer applicable since v0.2