maljovec / topopy

A library for computing topological data structures stemming from Morse Theory. Given a set of arbitrarily arranged points in any dimension, this library is able to construct approximate topological structures using a neighborhood graph to simulate manifold structures.
BSD 3-Clause "New" or "Revised" License
20 stars 6 forks source link

Can topopy be used on a 3D surface mesh using each point's Kmin/Kmax #37

Open Diramorius opened 4 years ago

Diramorius commented 4 years ago

I wanted to create a segmentation algoriothm using the morse-smale complex code. Does the topopy code allows such an implementation?

maljovec commented 4 years ago

I think what you want to do would work, but it might take a little amount of work to get this to work with your predefined mesh. Right now, this library assumes that you want it to build the connectivity graph that connects your points. It actually does not use faces because it was meant to approximate the Morse-Smale complex on high-dimensional spaces where keeping track of anything more than a 1-skeleton quickly becomes intractable.

As of right now, using your existing mesh, while possible, will be more complicated than it needs to be. Let me see, if I can simplify this API so that you can pass in your own edge list (which hopefully you could generate from your mesh).

I am not sure what you mean by kmin and kmax. Are these curvatures on your mesh? If so, you could use one or the other to define a function on your mesh, but not both at the same time. You could construct two different segmentations using these. Does this help?

maljovec commented 4 years ago

For what it is worth, I will try to put out a newer version that accepts a pre-defined graph on your data. This might be a change in a related library nglpy where it seems that I already have some C++ code to do this, but I don't think it is surfaced to the public python side. I should be able to do this over the weekend, but feel free to poke me if you don't see any updates by early next week.

maljovec commented 4 years ago

If it is at all possible to share some test data from your use case, I would love to make sure I could do this in a way that meets your needs.

maljovec commented 4 years ago

In case any of my comments give you any reservation about using this for your needs, below is a potential alternative:

I have not used it myself, but have heard good things about TTK that could be worth checking out as well.

Whatever you choose, I would be curious to know how I can improve this library to make it as simple as possible to get what you need out of it.

I would be remiss if I did not make sure you at least saw the documentation for my project: https://topopy.readthedocs.io/en/latest/?badge=latest

Thanks for your interest in this project!

Diramorius commented 4 years ago

I was exploring various morse-smale implementation, I found yours very compact to use. TTK seems good too, but too intricate for my purposes.

Really glad to hear your response so promptly. Appreciate it greatly.

On your question on the ease of use of the library, I am looking for an implementation where points, connectivity, scalar values can be the input and the morse-smale algorithm is able to generate partitions based on the attached scalar values on points. On that, (I have looked through your documentation, I am wondering if the graph (nglpy.Graph) might be the key towards providing the connectivity?

The problem I am trying to solve is segmentation based on salient edges (Kmax and Kmin) (curvatures). Much like the paper "Separatrix Persistence: Extraction of Salient Edges on Surfaces Using Topological Methods". I'm using it for a dental based project where I wanted to segment out individual teeths from a dental scan. "Multi-Region Segmentation Based on Compact Shape Prior"

But to achieve all these, I'll first need an implementation of the morse-smale complex algorithm. Thanks so much for replying. Stay safe!

maljovec commented 4 years ago

I started a PR, bu I still need to add unit tests and try it out with topopy, but in case you wanted to follow along it is here: https://github.com/maljovec/nglpy/pull/17.

Though I haven't tested it yet, the idea would be you would construct your own graph and pass it into the MorseSmaleComplex constructor. Something like:

import topopy
import nglpy

...
# Your data defined here with:
# X being an nX3 array of your points
# edges being a list of tuples consisting of two integers each that
#    represent the indices of the rows in X
# Y would be an nX1 array of your curvatures at each point in X

graph = nglpy.PrebuiltGraph(edges)
msc = topopy.MorseSmaleComplex(graph=graph)
msc.build(X, Y)

I hope to have a real example to try out soon, but if you are under a time crunch you could try building the specific nglpy branch in the PR.

Diramorius commented 4 years ago

Great! I'll be giving this a test. Thank!

maljovec commented 4 years ago

I was hoping to test on a similar 3D model and post my results here (I may still try it), but for what it is worth, that PrebuiltGraph from nglpy should be available if you make sure to update to version 1.1.0 which should be the latest on PyPI. I also have a PR for this project that will make this version the default prerequisite for the latest and newer versions of topopy moving forward.

If you beat me to it, please let me know if it works or if something breaks.

Diramorius commented 4 years ago

Sure, i'll work on it for a couple of days, will need to prepare the data for testing. Thank you so much. stay safe.

On Tue, Apr 21, 2020 at 3:17 AM Dan Maljovec notifications@github.com wrote:

I was hoping to test on a similar 3D model and post my results here (I may still try it), but for what it is worth, that PrebuiltGraph from nglpy should be available if you make sure to update to version 1.1.0 which should be the latest on PyPI. I also have a PR for this project that will make this version the default prerequisite for the latest and newer versions of topopy moving forward.

If you beat me to it, please let me know if it works or if something breaks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maljovec/topopy/issues/37#issuecomment-616755690, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDAYIKFS5RK7KD4EB3PBG3RNSNVXANCNFSM4MKSEBJA .

Diramorius commented 4 years ago

hi Dan, i tried using your code to test out using a simple elliosoid object. Now X is a nX3 array. I got the error Traceback (most recent call last):

File "D:\Code Test\PythonAppTest\MorseSmale.py", line 66, in msc.build(X, k1)

File "C:\ProgramData\Anaconda3\lib\site-packages\topopy\MorseSmaleComplex.py", line 109, in build super(MorseSmaleComplex, self).build(X, Y, w)

File "C:\ProgramData\Anaconda3\lib\site-packages\topopy\TopologicalObject.py", line 232, in build self.graph.build(self.Xnorm)

File "C:\ProgramData\Anaconda3\lib\site-packages\nglpy\PrebuiltGraph.py", line 56, in build for x in self.edges

File "C:\ProgramData\Anaconda3\lib\site-packages\nglpy\PrebuiltGraph.py", line 57, in if not (x in seen or x[::-1] in seen or seen.add(x))

TypeError: unhashable type: 'numpy.ndarray'

I added some files, if you try it out i'll be very grateful sm.zip . Thanks!

Diramorius commented 4 years ago

To add on, for the *.txt, the last two numbers in every line specified the K1 and K2 of the curvature at that point. I am using a ndarray of K1 for the Y values. sm.zip

Realised that the edges need to be in a list of tuples. Changed that portion. Its a new error type.

Traceback (most recent call last):

File "D:\Code Test\PythonAppTest\MorseSmale.py", line 76, in msc.build(X, k1)

File "C:\ProgramData\Anaconda3\lib\site-packages\topopy\MorseSmaleComplex.py", line 109, in build super(MorseSmaleComplex, self).build(X, Y, w)

File "C:\ProgramData\Anaconda3\lib\site-packages\topopy\TopologicalObject.py", line 232, in build self.graph.build(self.Xnorm)

File "C:\ProgramData\Anaconda3\lib\site-packages\nglpy\PrebuiltGraph.py", line 63, in build edges = vectorInt(edgeList)

File "C:\ProgramData\Anaconda3\lib\site-packages\nglpy\ngl.py", line 391, in init this = _ngl.new_vectorInt(*args)

NotImplementedError: Wrong number or type of arguments for overloaded function 'new_vectorInt'. Possible C/C++ prototypes are: std::vector< int >::vector() std::vector< int >::vector(std::vector< int > const &) std::vector< int >::vector(std::vector< int >::size_type) std::vector< int >::vector(std::vector< int >::size_type,std::vector< int >::value_type const &)

dmaljovec commented 4 years ago

I will try to take a closer look this weekend.

dmaljovec commented 4 years ago

Ugh, I think I spotted the problem. The Python/C interoperability does not like when we use numpy.int64s to create the C++ std::vector. I can fix things on my end, but it's going to be faster for you to just cast your edge tuples into standard Python int type.

Specifically, here is the code I modified from your script to get past this issue:

edges_tuples = []
for i in range(n_edges.shape[0]):
    edges_tuples += [(int(n_edges[i][0]), int(n_edges[i][1]))]

I will fix nglpy to do this cast underneath the covers, and push a newer version from my end.

dmaljovec commented 4 years ago

Thanks for trying this out. I deployed a new version of nglpy, so you should be able to update to version nglpy=1.1.1. That should hopefully obviate the need for you to change anything in your code snippet.

Let me know if you run into any more troubles, and also the results of your experiment. I would love to see what my code has helped enable you to do!

Diramorius commented 4 years ago

thanks! it works. I'll go give it a try on my dental dataset. I'll post some visualization once I get everything working. thanks a lot! Appreciate it greatly. Take care and stay safe!

Diramorius commented 4 years ago

Hi there again, i was wondering if you can clarify on the outout of msc.get_partitions() ?

For example, in the first key value pairing using a test case, the output is {(5, 31): [894, 1034]}

Does it means there is an edge path from 5->894->1034->31?

I realised that my interpretation might be wrong in this case, since most of the key-value pairing does not lead to a value edge path.

if you can clarify that would be very useful, thanks! take care and stay safe! :)

Diramorius commented 4 years ago

I took the example code, and printed out the msc.get_partitions(), the first key pairing is (6, 4) : [6, 20, 26, 78]

i visualize the dataset, and plotted out the sequence of points formed into lines..

morse

I think i might really have gotten it wrong on the interpretation.

maljovec commented 4 years ago

get_partitions returns a dictionary where the key is a tuple where the first integer is the index of the minimum and the second integer is the index of the maximum. The value of each key is a list of integers representing indices of points belonging to the partition represented by the pair of extrema. Does that make sense? The order of the list has no real meaning. I hope this helps clarify some things. Feel free to ask more questions, I might be trying to be too pedantic in my description here and would be more than happy to give a more intuitive answer.

On another note, given this partition seems both very small and spanning a disparate part of the domain, I am suspicious that this may be attributable to noise in the data or imposed by the connectivity structure. It is often helpful to use a persistence plot to choose the correct persistence level for evaluation. Once I get to my computer, I will try to write up an example of what this would look like both visually and hopefully with some code.

Diramorius commented 4 years ago

Hey hi, thanks for the reply! So the value of each keys is a cluster of points that represent a partition (is my understanding correct?)

Does that therefore means that each input point will be included in one of the partition? Will there be input points who are not in any of the partitions?

Yes, if there is a visual graphic of the code, it would be wonderful.

thank you so much again for helping me in my understanding of the morse-smale code.

maljovec commented 4 years ago

I dug up some code from another project that I think will accomplish plotting the persistence chart.

import matplotlib.pyplot as plt

def show_persistence_chart(topological_complex,
                           screen=False,
                           filename="persistence_chart.png"):
    """ Plots the persistence chart to screen/file and returns the plotted data

    Parameters
    ----------
    topological_complex : topopy.MorseComplex or topopy.MorseSmaleComplex
        a "built" complex whose persistence chart we wish to visualize
    screen : bool
        Flag specifying whether the code should show the plot onscreen. Setting
        this to True will block until the window is closed.
    filename: str
        The output file to which this plot will be saved.    

    Returns
    -------
    None

    """
    plt.figure()

    ps = [0]

    count = len(np.unique(list(topological_complex.get_partitions(0).keys())))
    counts = [count]
    eps = 1e-6
    for i, p in enumerate(topological_complex.persistences):
        ps.append(p)
        counts.append(count)
        count = len(np.unique(list(topological_complex.get_partitions(p + eps).keys())))
        ps.append(p)
        counts.append(count)

    plt.plot(ps, counts, c="#1f78b4")

    ax = plt.gca()
    plt.savefig(filename, bbox_inches="tight")

    if screen:
        plt.show()
    plt.close()
    # return ps, counts
maljovec commented 4 years ago

Hey hi, thanks for the reply! So the value of each keys is a cluster of points that represent a partition (is my understanding correct?)

Yes.

Does that therefore means that each input point will be included in one of the partition? Will there be input points who are not in any of the partitions?

Every point should be in one and only one partition.

Yes, if there is a visual graphic of the code, it would be wonderful.

Hopefully, the code I posted above works for you. I will follow up with a post of what it should produce and how to interpret it.

thank you so much again for helping me in my understanding of the morse-smale code.

Thanks for your interest in the project! I hope it ends up working in the end.

maljovec commented 4 years ago

persistence_example

Here is an example persistence plot where the horizontal axis represents persistence and the vertical axis is the count of extrema "alive" at a given persistence value. The heuristic we have often used in the past was to find a "stable" region where the same count of extrema "survives" for a large span of persistence. In this case, I would probably select the bold orange line as it is the first significant span of persistence. In this case, I would consider the features before this to be noise in the model. Again, this is just a heuristic and there is no real hard and fast rules as to what constitutes a "significant" span. It is possible your knowledge about the underlying data can also inform the selection of the "correct" persistence level. We have seen cases where the domain knowledge dictates something that is not readily apparent from this type of visualization.

Diramorius commented 4 years ago

Thanks for all your help Dan! I managed to use ur code to generate a tooth segmentation code. In the attached picture, the middle image is generated using your morse smale algorithm. I used it to further cluster the patches into individual tooth.

tooth segmentation

Diramorius commented 4 years ago

However, i did encounter an issue when i tried to load a ply file that is 4mb or above, the "msc.get_partitions(persistence=0.8)" function simply took too much time till the kernel has to restart. Would you have any idea what is causing it?

If I do wanted a faster version, the C++ code is also in the git hub directory? I just needed the nglpy and topogy code rite?

Many many thanks again!!

maljovec commented 4 years ago

Hmm. When you say the kernel restarted, I am assuming you are working in a Jupyter notebook or something? So, it looks like maybe the code is crashing, possibly. I wonder if you can run somewhere else and produce a stack trace? How many points/edges are in the ply file (rough estimate is good enough)? I don't think it is the size of the complex that it is computing, but it would be good to rule that out.

FWIW, the code in topopy that computes the Morse-Smale Complex is in C++, but it sounds like your problem is not in computing the Morse-Smale, but asking for a specific persistence level. That code is here: https://github.com/maljovec/topopy/blob/master/topopy/MorseSmaleComplex.py#L198. As I type this response, I am thinking that possibly the code is stuck in one of the while loops (https://github.com/maljovec/topopy/blob/master/topopy/MorseSmaleComplex.py#L231 and https://github.com/maljovec/topopy/blob/master/topopy/MorseSmaleComplex.py#L236). Possibly there is a problem in the merge_sequence data for this data set. I would be curious if we could try and debug this function.

Either I can try it with your data, or you could possibly put a break point or some prints inside one or both of these loops and we can try to figure out what is happening here.

maljovec commented 4 years ago

Also, knowing the scale of your output value could be useful information for diagnosing.

maljovec commented 4 years ago

What could be happening is somehow two minima (or maxima) see each other as their surviving parent upon cancellation, and we get caught in an infinite loop where A -> B -> A -> B, until the kernel crashes. So, while the symptom may be showing up in the get_partitions function, the underlying merge_sequence data structure is the real problem if this is the case. I would need to see this and understand how the merge_sequence gets populated this way to figure out how we can fix this.

Diramorius commented 4 years ago

For an edge tuple len of 150k, the code runs file. Using a larger dataset of 230k size, the kernel eventually restarts itself. (around 3-5 mins). I tried running it direct on the anaconda command prompt, but the same result occurs.

I'll post the code and data here, hope you can give it a try. The dataset r01 works, but r08 fails. MorseSmaleCode.zip

Thanks! Take care and stay safe!

Diramorius commented 4 years ago

i think i figured out what is the issue. It seems that at an edge len of 250k, the morse-smale complex code is using an insanely high memory at 80gb.

My home desktop has only 16gb, and when it hits the limit, the kernel restarts. But it seems all good when i'm using my office workstation, it managed to generate the output just fine.

Now the problem is figuring out how use a more manageable amount of memory. thanks Dan!

maljovec commented 4 years ago

the morse-smale complex code is using an insanely high memory at 80gb.

Yikes! I will see if we can get that number down. That is a bit unmanageable. I am glad you got it to work, but that kind of memory consumption seems unnecessary for this.

Diramorius commented 4 years ago

thanks Dan! Really hope to scale it down to something more manageable on a standard PC.

dmaljovec commented 4 years ago

I hope you are not waiting for this fix. I just reread your earlier posts and am worried you could not get your larger dataset analyzed. Things have gotten busier on my end, and not sure what kind of timeline I can provide on a fix here as I have yet to find time to investigate this.

Diramorius commented 4 years ago

Hi Dan! Thanks for the reply. Right now I can still remotely use one of the workstation with larger memory to work through some of the bigger cases. But eventually i would need to resolve this issue with a lower memory set type of workstation. TBH, I am very grateful for your help throughout our interaction so far. Dun stress over it, I'll probably have around 1-2 months buffer before I'll need to work on resolving it looking through the C++ source. thanks again! Take care and stay safe!

dmaljovec commented 4 years ago

Sorry for the long delay. I found a few minutes to try this out, but I think it worked for me? I ran it through my debugger (VSCode) and stepped through each line and tried to inspect the memory as I went on the relevant objects. I was using this answer: https://stackoverflow.com/a/30316760/11580262. I didn't notice a huge spike in memory in anything I inspected, and I was able to run it on a Macbook (2018, 16GB memory). I am attaching my output if you wanted to compare.

r08-partitions.txt

Let me know if I can do anything else to help diagnose, but at this time I am unaware of where that memory spike you are seeing is coming from. What version of python are you running and what does your environment look like?

Here is my conda environment that I ran it in:

$ conda list
# Name                    Version                   Build  Channel
astroid                   2.4.2                    py36_0  
bandit                    1.6.2                    py36_0    conda-forge
brotlipy                  0.7.0           py36haf1e3a3_1000  
ca-certificates           2020.6.24                     0  
cachetools                4.1.0                      py_1  
certifi                   2020.6.20                py36_0  
cffi                      1.14.0           py36hb5b8e2f_0  
chardet                   3.0.4                 py36_1003  
confluent-kafka           1.5.0                    pypi_0    pypi
cryptography              2.9.2            py36ha12b0ac_0  
cycler                    0.10.0                   pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
entrypoints               0.3                      py36_0  
flake8                    3.7.9                    py36_0  
ghalton                   0.6.2                    pypi_0    pypi
gitdb                     4.0.2                      py_0  
gitpython                 3.1.1                      py_1  
google-api-core           1.17.0                   py36_0  
google-api-python-client  1.9.1              pyh9f0ad1d_1    conda-forge
google-auth               1.17.2                     py_0  
google-auth-httplib2      0.0.3                      py_3    conda-forge
googleapis-common-protos  1.51.0                   py36_2  
httplib2                  0.18.1             pyh9f0ad1d_0    conda-forge
idna                      2.10                       py_0  
isort                     4.3.21                   py36_0  
joblib                    0.16.0                   pypi_0    pypi
kiwisolver                1.2.0                    pypi_0    pypi
lazy-object-proxy         1.4.3            py36h1de35cc_0  
libcxx                    4.0.1                hcfea43d_1  
libcxxabi                 4.0.1                hcfea43d_1  
libedit                   3.1.20181209         hb402a30_0  
libffi                    3.2.1                h475c297_4  
libprotobuf               3.11.4               hd9629dc_0  
matplotlib                3.3.0                    pypi_0    pypi
mccabe                    0.6.1                    py36_1  
mypy                      0.770                      py_0  
mypy_extensions           0.4.3                    py36_0  
ncurses                   6.1                  h0a44026_1  
networkx                  2.5                      pypi_0    pypi
nglpy                     1.1.1                     dev_0    <develop>
numpy                     1.19.1                   pypi_0    pypi
openssl                   1.1.1g               h1de35cc_0  
pandas                    1.0.5                    pypi_0    pypi
pbr                       5.4.4                      py_0  
pillow                    7.2.0                    pypi_0    pypi
pip                       20.1.1                   py36_1  
plyfile                   0.7.2                    pypi_0    pypi
protobuf                  3.11.4           py36h0a44026_0  
psutil                    5.7.0            py36h1de35cc_0  
pyasn1                    0.4.8                      py_0  
pyasn1-modules            0.2.7                      py_0  
pycodestyle               2.5.0                    py36_0  
pycparser                 2.20                       py_2  
pydocstyle                4.0.1                      py_0  
pydoe                     0.3.8                    pypi_0    pypi
pyflakes                  2.1.1                    py36_0  
pylama                    7.7.1                      py_0    conda-forge
pylint                    2.5.3                    py36_0  
pyopenssl                 19.1.0                     py_1  
pyparsing                 2.4.7                    pypi_0    pypi
pysocks                   1.7.1                    py36_0  
python                    3.6.10               h359304d_0  
python-dateutil           2.8.1                    pypi_0    pypi
pytz                      2020.1                     py_0  
pyyaml                    3.13             py36h1de35cc_0  
readline                  7.0                  h1de35cc_5  
requests                  2.24.0                     py_0  
rsa                       4.0                        py_0  
samply                    0.0.21                   pypi_0    pypi
scikit-learn              0.23.2                   pypi_0    pypi
scipy                     1.5.2                    pypi_0    pypi
seaborn                   0.10.1                   pypi_0    pypi
setuptools                49.2.0                   py36_0  
simplejson                3.17.0           py36h1de35cc_0  
six                       1.15.0                     py_0  
sklearn                   0.0                      pypi_0    pypi
smmap                     3.0.2                      py_0  
snowballstemmer           2.0.0                      py_0  
sqlite                    3.30.1               ha441bb4_0  
stevedore                 1.30.1                     py_0    conda-forge
structlog                 18.2.0                     py_0    conda-forge
threadpoolctl             2.1.0                    pypi_0    pypi
tk                        8.6.8                ha441bb4_0  
toml                      0.10.0             pyh91ea838_0  
toolz                     0.10.0                     py_0  
topopy                    1.0.1                     dev_0    <develop>
typed-ast                 1.4.1            py36h1de35cc_0  
typing_extensions         3.7.4.2                    py_0  
uritemplate               3.0.1                      py_0    conda-forge
urllib3                   1.25.9                     py_0  
wheel                     0.34.2                   py36_0  
wrapt                     1.11.2           py36h1de35cc_0  
xz                        5.2.4                h1de35cc_4  
yaml                      0.1.7                hc338f04_2  
yapf                      0.30.0                   pypi_0    pypi
zlib                      1.2.11               h1de35cc_3

A lot of this is cruft from other things.

Diramorius commented 4 years ago

Hi Dan,

  Thanks so much for looking into this. From what I experienced, it seems like Python does manage with the amount of memory it has at its disposal. If the machine has only 16gb, it will use up to the max. But at times, the kernel can be overburdened and the program stops, kernel then undergoes force-restarting. Not sure how it impact mac/windows machines differently. 

  I tested the same code with r08 input on both machines, one with 32gb and the other with 386gb memory. Attached you can see the task manager output. Once it reaches the point where the morsesmale code starts paritioning, the memory starts to spike up. I'll try to explore how to get the allocated memory size, based on the link you provided. 

desktop2 desktop1 condalist.txt

regards Calvin

dmaljovec commented 4 years ago

Oh wait, I do see a memory spike if I watch the Activity Monitor on a command line run. I see python jump to near 70 GB of memory. It seems to get released before completing because now it is back down to <300 MB and still running. I might have just inspected the wrong items or at the wrong time.

I am going to try to run a memory profiler, and see if I can get more information as to why the memory spikes so drastically and if this is something that can be avoided.

dmaljovec commented 4 years ago

So, I installed memory-profiler and wrapped your script in a function so I could decorate it with the @profile (just using memory-profiler without this didn't seem to give any output).

Here are the results:


$ python -m memory_profiler MorseSmale.py
Filename: MorseSmale.py

Line #    Mem usage    Increment   Line Contents
================================================
    18     93.8 MiB     93.8 MiB   @profile
    19                             def main():
    20     93.8 MiB      0.0 MiB       filename_base = "data/r08"
    21                             
    22                                 #just a Helper class to run some functions 
    23     93.8 MiB      0.0 MiB       si = SortIndex() 
    24                             
    25                                 # read in the ply object file 
    26    111.7 MiB     18.0 MiB       plydata = PlyData.read(filename_base + '.ply')
    27                             
    28                                 # create an empty edges list array 
    29    111.7 MiB      0.0 MiB       n_edges = np.empty([0, 2], dtype=int)
    30                             
    31                                 # populating the edges list 
    32    230.8 MiB      0.0 MiB       for i in range(len(plydata['face'].data['vertex_indices'])):
    33    230.8 MiB      0.1 MiB           n_edges = si.output_edges(n_edges, plydata['face'].data['vertex_indices'][i])
    34                                 
    35                                 # sort by column 0, then by column 1, easier to remove dulplicate
    36                                 # this only works for closed manifold, might not be useful if the ply mesh has boundaries 
    37    223.8 MiB      0.0 MiB       n_edges = n_edges[ np.lexsort( (n_edges[:, 1] , n_edges[:, 0]) ) ] 
    38                             
    39    174.3 MiB      0.0 MiB       n_edges = si.unique_edges(n_edges)
    40                             
    41                             
    42                                 # printing information relating to edges 
    43                                 # for i in range(n_edges.shape[0]):
    44                                 #     if n_edges[i][0] != 0 :
    45                                 #         break 
    46                                 #     print(n_edges[i])
    47                             
    48    174.3 MiB      0.0 MiB       print(n_edges.shape)
    49                             
    50                             
    51    174.3 MiB      0.0 MiB       edges_tuples = [] 
    52    174.4 MiB      0.0 MiB       for i in range(n_edges.shape[0]):
    53    174.4 MiB      0.0 MiB           edges_tuples += [(n_edges[i][0], n_edges[i][1])]
    54                                     
    55    152.6 MiB      0.0 MiB       print(len(edges_tuples))
    56                             
    57                                 # now we populate the X, the location of the points 
    58    152.6 MiB      0.0 MiB       X = np.empty([0, 3], dtype=float)
    59                             
    60                                 #print(len(plydata.elements[0].data))
    61    171.6 MiB      0.0 MiB       for i in range(len(plydata.elements[0].data)):
    62    171.6 MiB      0.0 MiB           pos = [] 
    63    171.6 MiB      0.0 MiB           for j in range(3):
    64    171.6 MiB      0.0 MiB               pos += [plydata.elements[0].data[i][j]]
    65                                         
    66    171.6 MiB      0.0 MiB           X = np.append(X, [pos] , axis = 0)
    67                                 
    68                                 #print(type(X[0][0]))    
    69                                 #print(X)
    70                             
    71                                 # now we read the file for the K1 and K2 
    72    151.6 MiB      0.0 MiB       k1, k2 = si.read_curvature(filename_base + ".txt")
    73    152.1 MiB      0.6 MiB       k1= np.array(k1)
    74                                 
    75                                     
    76                                 #now we try to get the morse smale function 
    77    152.1 MiB      0.0 MiB       graph = ngl.PrebuiltGraph(edges_tuples)
    78    152.1 MiB      0.0 MiB       msc = topopy.MorseSmaleComplex(graph=graph, gradient='steepest', normalization='feature')
    79    189.5 MiB     37.4 MiB       msc.build(X, k1)
    80    190.2 MiB      0.7 MiB       partitions = msc.get_partitions(persistence=0.8) 
    81                             
    82    190.2 MiB      0.0 MiB       f = open(filename_base + '-partitions.txt', 'w')
    83                             
    84                             
    85    190.3 MiB      0.0 MiB       for k, v in partitions.items():
    86    190.3 MiB      0.0 MiB           for j in k :
    87    190.3 MiB      0.0 MiB               f.write(str(j))
    88    190.3 MiB      0.0 MiB               f.write(" ")
    89    190.3 MiB      0.0 MiB           f.write("\n"); 
    90                             
    91    190.3 MiB      0.0 MiB           for j in v:
    92    190.3 MiB      0.0 MiB               f.write(str(j))
    93    190.3 MiB      0.0 MiB               f.write(" ")
    94    190.3 MiB      0.0 MiB           f.write("\n");
    95                                     
    96    190.3 MiB      0.0 MiB       f.close()
dmaljovec commented 4 years ago

So, it could be something in the C layer that is maybe not visible to this tool? I am using a tool called SWIG to generate a Python interface for the underlying C++ implementation, and I don't know if that would be adding anything. Since the initial construction of this library, I have done some other work where I wrote a more native Python-compatible C layer, so maybe I can excise the need for SWIG and see if that fixes anything 🤷. Before that, I should probably do some Fermi estimations to ensure that we should not expect that much memory usage in the C layer of things given the data structures I used. I haven't looked at that code in a while, but it might be ripe for some optimizations. We should also try to rule out the nglpy library as a possible source of problems as well. I could try to run your script with just constructing that graph and see if it still uses the memory as well.

dmaljovec commented 4 years ago

It looks like I see the memory spike happen when just using the nglpy library without topopy. I'll start looking at that library first.

Diramorius commented 4 years ago

thanks Dan! Appreciate your time and effort looking into this issue!

dmaljovec commented 4 years ago

Calvin,

While debugging this code, I noticed something that looks like a possible severe bottleneck in some of the code you posted. You are using np.append to build your data structures. If you instead switch to just populating a plain python list and then convert the list to an array when it is filled, you can make your script a lot faster.

For instance, I changed this code in SortIndex:

    def output_edges(self, current_triangle_list):
        a = current_triangle_list[0]
        b = current_triangle_list[1]
        c = current_triangle_list[2]

        full_edge_list = []

        if ( a < b ):
            full_edge_list.append([a, b])
        else:
            full_edge_list.append([b, a])

        if ( b < c ):
            full_edge_list.append([b, c])
        else:
            full_edge_list.append([c, b])

        if ( a < c ):
            full_edge_list.append([a, c])
        else:
            full_edge_list.append([c, a])

        return full_edge_list

It could be even more succinct if we used inline if conditionals, but this is only the relevant changes to speed things up.

The relevant call would then change to:

    n_edges = []
    # populating the edges list 
    for i in range(len(plydata['face'].data['vertex_indices'])):
        n_edges.extend(si.output_edges(plydata['face'].data['vertex_indices'][i]))
    n_edges = np.array(n_edges)

With some simplistic timing, I was able to get this code to go from ~370 s down to <4 s. It looks like X is built in similar fashion.

This does not solve any of our other problems, but could improve turnaround time if you are debugging any of this.

dmaljovec commented 4 years ago

I think I found the problem. It has to do with this idea of a prebuilt graph being very wasteful of memory. I will try to update the nglpy library and publish a new patch version that you can hopefully use.

dmaljovec commented 4 years ago

I have started work on the fix: https://github.com/maljovec/nglpy/pull/19. I think there is more I can do to limit the data footprint since I was being a bit cavalier with the data structures needed. I'll try to get to this later this evening since it is fresh in my mind.

dmaljovec commented 4 years ago

Good news, I got our example down to 22 GB, and there is more clean up work to be done.

Diramorius commented 4 years ago

thanks Dan! thank you so much for the help!

dmaljovec commented 4 years ago

Okay, I just pushed nglpy 1.1.2. Try updating that dependency and see if it works any better for you. It appears to fix it for me, and also it has made the code a lot faster too! Thanks for pointing this out, I really appreciate the opportunity to improve the performance of these libraries.

Diramorius commented 4 years ago

Thanks! I am currently testing it with the new dataset, will see if it is possible to work within the confines of a 32gb windows machine. Appreciate it greatly, Dan!