Open Diramorius opened 4 years ago
I think what you want to do would work, but it might take a little amount of work to get this to work with your predefined mesh. Right now, this library assumes that you want it to build the connectivity graph that connects your points. It actually does not use faces because it was meant to approximate the Morse-Smale complex on high-dimensional spaces where keeping track of anything more than a 1-skeleton quickly becomes intractable.
As of right now, using your existing mesh, while possible, will be more complicated than it needs to be. Let me see, if I can simplify this API so that you can pass in your own edge list (which hopefully you could generate from your mesh).
I am not sure what you mean by kmin and kmax. Are these curvatures on your mesh? If so, you could use one or the other to define a function on your mesh, but not both at the same time. You could construct two different segmentations using these. Does this help?
For what it is worth, I will try to put out a newer version that accepts a pre-defined graph on your data. This might be a change in a related library nglpy
where it seems that I already have some C++ code to do this, but I don't think it is surfaced to the public python side. I should be able to do this over the weekend, but feel free to poke me if you don't see any updates by early next week.
If it is at all possible to share some test data from your use case, I would love to make sure I could do this in a way that meets your needs.
In case any of my comments give you any reservation about using this for your needs, below is a potential alternative:
I have not used it myself, but have heard good things about TTK that could be worth checking out as well.
Whatever you choose, I would be curious to know how I can improve this library to make it as simple as possible to get what you need out of it.
I would be remiss if I did not make sure you at least saw the documentation for my project: https://topopy.readthedocs.io/en/latest/?badge=latest
Thanks for your interest in this project!
I was exploring various morse-smale implementation, I found yours very compact to use. TTK seems good too, but too intricate for my purposes.
Really glad to hear your response so promptly. Appreciate it greatly.
On your question on the ease of use of the library, I am looking for an implementation where points, connectivity, scalar values can be the input and the morse-smale algorithm is able to generate partitions based on the attached scalar values on points. On that, (I have looked through your documentation, I am wondering if the graph (nglpy.Graph) might be the key towards providing the connectivity?
The problem I am trying to solve is segmentation based on salient edges (Kmax and Kmin) (curvatures). Much like the paper "Separatrix Persistence: Extraction of Salient Edges on Surfaces Using Topological Methods". I'm using it for a dental based project where I wanted to segment out individual teeths from a dental scan. "Multi-Region Segmentation Based on Compact Shape Prior"
But to achieve all these, I'll first need an implementation of the morse-smale complex algorithm. Thanks so much for replying. Stay safe!
I started a PR, bu I still need to add unit tests and try it out with topopy, but in case you wanted to follow along it is here: https://github.com/maljovec/nglpy/pull/17.
Though I haven't tested it yet, the idea would be you would construct your own graph and pass it into the MorseSmaleComplex
constructor. Something like:
import topopy
import nglpy
...
# Your data defined here with:
# X being an nX3 array of your points
# edges being a list of tuples consisting of two integers each that
# represent the indices of the rows in X
# Y would be an nX1 array of your curvatures at each point in X
graph = nglpy.PrebuiltGraph(edges)
msc = topopy.MorseSmaleComplex(graph=graph)
msc.build(X, Y)
I hope to have a real example to try out soon, but if you are under a time crunch you could try building the specific nglpy branch in the PR.
Great! I'll be giving this a test. Thank!
I was hoping to test on a similar 3D model and post my results here (I may still try it), but for what it is worth, that PrebuiltGraph
from nglpy
should be available if you make sure to update to version 1.1.0
which should be the latest on PyPI. I also have a PR for this project that will make this version the default prerequisite for the latest and newer versions of topopy moving forward.
If you beat me to it, please let me know if it works or if something breaks.
Sure, i'll work on it for a couple of days, will need to prepare the data for testing. Thank you so much. stay safe.
On Tue, Apr 21, 2020 at 3:17 AM Dan Maljovec notifications@github.com wrote:
I was hoping to test on a similar 3D model and post my results here (I may still try it), but for what it is worth, that PrebuiltGraph from nglpy should be available if you make sure to update to version 1.1.0 which should be the latest on PyPI. I also have a PR for this project that will make this version the default prerequisite for the latest and newer versions of topopy moving forward.
If you beat me to it, please let me know if it works or if something breaks.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maljovec/topopy/issues/37#issuecomment-616755690, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDAYIKFS5RK7KD4EB3PBG3RNSNVXANCNFSM4MKSEBJA .
hi Dan, i tried using your code to test out using a simple elliosoid object. Now X is a nX3 array. I got the error Traceback (most recent call last):
File "D:\Code Test\PythonAppTest\MorseSmale.py", line 66, in
File "C:\ProgramData\Anaconda3\lib\site-packages\topopy\MorseSmaleComplex.py", line 109, in build super(MorseSmaleComplex, self).build(X, Y, w)
File "C:\ProgramData\Anaconda3\lib\site-packages\topopy\TopologicalObject.py", line 232, in build self.graph.build(self.Xnorm)
File "C:\ProgramData\Anaconda3\lib\site-packages\nglpy\PrebuiltGraph.py", line 56, in build for x in self.edges
File "C:\ProgramData\Anaconda3\lib\site-packages\nglpy\PrebuiltGraph.py", line 57, in
TypeError: unhashable type: 'numpy.ndarray'
I added some files, if you try it out i'll be very grateful sm.zip . Thanks!
To add on, for the *.txt, the last two numbers in every line specified the K1 and K2 of the curvature at that point. I am using a ndarray of K1 for the Y values. sm.zip
Realised that the edges need to be in a list of tuples. Changed that portion. Its a new error type.
Traceback (most recent call last):
File "D:\Code Test\PythonAppTest\MorseSmale.py", line 76, in
File "C:\ProgramData\Anaconda3\lib\site-packages\topopy\MorseSmaleComplex.py", line 109, in build super(MorseSmaleComplex, self).build(X, Y, w)
File "C:\ProgramData\Anaconda3\lib\site-packages\topopy\TopologicalObject.py", line 232, in build self.graph.build(self.Xnorm)
File "C:\ProgramData\Anaconda3\lib\site-packages\nglpy\PrebuiltGraph.py", line 63, in build edges = vectorInt(edgeList)
File "C:\ProgramData\Anaconda3\lib\site-packages\nglpy\ngl.py", line 391, in init this = _ngl.new_vectorInt(*args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'new_vectorInt'. Possible C/C++ prototypes are: std::vector< int >::vector() std::vector< int >::vector(std::vector< int > const &) std::vector< int >::vector(std::vector< int >::size_type) std::vector< int >::vector(std::vector< int >::size_type,std::vector< int >::value_type const &)
I will try to take a closer look this weekend.
Ugh, I think I spotted the problem. The Python/C interoperability does not like when we use numpy.int64
s to create the C++ std::vector. I can fix things on my end, but it's going to be faster for you to just cast your edge tuples into standard Python int
type.
Specifically, here is the code I modified from your script to get past this issue:
edges_tuples = []
for i in range(n_edges.shape[0]):
edges_tuples += [(int(n_edges[i][0]), int(n_edges[i][1]))]
I will fix nglpy to do this cast underneath the covers, and push a newer version from my end.
Thanks for trying this out. I deployed a new version of nglpy, so you should be able to update to version nglpy=1.1.1
. That should hopefully obviate the need for you to change anything in your code snippet.
Let me know if you run into any more troubles, and also the results of your experiment. I would love to see what my code has helped enable you to do!
thanks! it works. I'll go give it a try on my dental dataset. I'll post some visualization once I get everything working. thanks a lot! Appreciate it greatly. Take care and stay safe!
Hi there again, i was wondering if you can clarify on the outout of msc.get_partitions() ?
For example, in the first key value pairing using a test case, the output is {(5, 31): [894, 1034]}
Does it means there is an edge path from 5->894->1034->31?
I realised that my interpretation might be wrong in this case, since most of the key-value pairing does not lead to a value edge path.
if you can clarify that would be very useful, thanks! take care and stay safe! :)
I took the example code, and printed out the msc.get_partitions(), the first key pairing is (6, 4) : [6, 20, 26, 78]
i visualize the dataset, and plotted out the sequence of points formed into lines..
I think i might really have gotten it wrong on the interpretation.
get_partitions
returns a dictionary where the key is a tuple where the first integer is the index of the minimum and the second integer is the index of the maximum. The value of each key is a list of integers representing indices of points belonging to the partition represented by the pair of extrema. Does that make sense? The order of the list has no real meaning. I hope this helps clarify some things. Feel free to ask more questions, I might be trying to be too pedantic in my description here and would be more than happy to give a more intuitive answer.
On another note, given this partition seems both very small and spanning a disparate part of the domain, I am suspicious that this may be attributable to noise in the data or imposed by the connectivity structure. It is often helpful to use a persistence plot to choose the correct persistence level for evaluation. Once I get to my computer, I will try to write up an example of what this would look like both visually and hopefully with some code.
Hey hi, thanks for the reply! So the value of each keys is a cluster of points that represent a partition (is my understanding correct?)
Does that therefore means that each input point will be included in one of the partition? Will there be input points who are not in any of the partitions?
Yes, if there is a visual graphic of the code, it would be wonderful.
thank you so much again for helping me in my understanding of the morse-smale code.
I dug up some code from another project that I think will accomplish plotting the persistence chart.
import matplotlib.pyplot as plt
def show_persistence_chart(topological_complex,
screen=False,
filename="persistence_chart.png"):
""" Plots the persistence chart to screen/file and returns the plotted data
Parameters
----------
topological_complex : topopy.MorseComplex or topopy.MorseSmaleComplex
a "built" complex whose persistence chart we wish to visualize
screen : bool
Flag specifying whether the code should show the plot onscreen. Setting
this to True will block until the window is closed.
filename: str
The output file to which this plot will be saved.
Returns
-------
None
"""
plt.figure()
ps = [0]
count = len(np.unique(list(topological_complex.get_partitions(0).keys())))
counts = [count]
eps = 1e-6
for i, p in enumerate(topological_complex.persistences):
ps.append(p)
counts.append(count)
count = len(np.unique(list(topological_complex.get_partitions(p + eps).keys())))
ps.append(p)
counts.append(count)
plt.plot(ps, counts, c="#1f78b4")
ax = plt.gca()
plt.savefig(filename, bbox_inches="tight")
if screen:
plt.show()
plt.close()
# return ps, counts
Hey hi, thanks for the reply! So the value of each keys is a cluster of points that represent a partition (is my understanding correct?)
Yes.
Does that therefore means that each input point will be included in one of the partition? Will there be input points who are not in any of the partitions?
Every point should be in one and only one partition.
Yes, if there is a visual graphic of the code, it would be wonderful.
Hopefully, the code I posted above works for you. I will follow up with a post of what it should produce and how to interpret it.
thank you so much again for helping me in my understanding of the morse-smale code.
Thanks for your interest in the project! I hope it ends up working in the end.
Here is an example persistence plot where the horizontal axis represents persistence and the vertical axis is the count of extrema "alive" at a given persistence value. The heuristic we have often used in the past was to find a "stable" region where the same count of extrema "survives" for a large span of persistence. In this case, I would probably select the bold orange line as it is the first significant span of persistence. In this case, I would consider the features before this to be noise in the model. Again, this is just a heuristic and there is no real hard and fast rules as to what constitutes a "significant" span. It is possible your knowledge about the underlying data can also inform the selection of the "correct" persistence level. We have seen cases where the domain knowledge dictates something that is not readily apparent from this type of visualization.
Thanks for all your help Dan! I managed to use ur code to generate a tooth segmentation code. In the attached picture, the middle image is generated using your morse smale algorithm. I used it to further cluster the patches into individual tooth.
However, i did encounter an issue when i tried to load a ply file that is 4mb or above, the "msc.get_partitions(persistence=0.8)" function simply took too much time till the kernel has to restart. Would you have any idea what is causing it?
If I do wanted a faster version, the C++ code is also in the git hub directory? I just needed the nglpy and topogy code rite?
Many many thanks again!!
Hmm. When you say the kernel restarted, I am assuming you are working in a Jupyter notebook or something? So, it looks like maybe the code is crashing, possibly. I wonder if you can run somewhere else and produce a stack trace? How many points/edges are in the ply file (rough estimate is good enough)? I don't think it is the size of the complex that it is computing, but it would be good to rule that out.
FWIW, the code in topopy that computes the Morse-Smale Complex is in C++, but it sounds like your problem is not in computing the Morse-Smale, but asking for a specific persistence level. That code is here: https://github.com/maljovec/topopy/blob/master/topopy/MorseSmaleComplex.py#L198. As I type this response, I am thinking that possibly the code is stuck in one of the while loops (https://github.com/maljovec/topopy/blob/master/topopy/MorseSmaleComplex.py#L231 and https://github.com/maljovec/topopy/blob/master/topopy/MorseSmaleComplex.py#L236). Possibly there is a problem in the merge_sequence
data for this data set. I would be curious if we could try and debug this function.
Either I can try it with your data, or you could possibly put a break point or some prints inside one or both of these loops and we can try to figure out what is happening here.
Also, knowing the scale of your output value could be useful information for diagnosing.
What could be happening is somehow two minima (or maxima) see each other as their surviving parent upon cancellation, and we get caught in an infinite loop where A -> B -> A -> B, until the kernel crashes. So, while the symptom may be showing up in the get_partitions
function, the underlying merge_sequence
data structure is the real problem if this is the case. I would need to see this and understand how the merge_sequence
gets populated this way to figure out how we can fix this.
For an edge tuple len of 150k, the code runs file. Using a larger dataset of 230k size, the kernel eventually restarts itself. (around 3-5 mins). I tried running it direct on the anaconda command prompt, but the same result occurs.
I'll post the code and data here, hope you can give it a try. The dataset r01 works, but r08 fails. MorseSmaleCode.zip
Thanks! Take care and stay safe!
i think i figured out what is the issue. It seems that at an edge len of 250k, the morse-smale complex code is using an insanely high memory at 80gb.
My home desktop has only 16gb, and when it hits the limit, the kernel restarts. But it seems all good when i'm using my office workstation, it managed to generate the output just fine.
Now the problem is figuring out how use a more manageable amount of memory. thanks Dan!
the morse-smale complex code is using an insanely high memory at 80gb.
Yikes! I will see if we can get that number down. That is a bit unmanageable. I am glad you got it to work, but that kind of memory consumption seems unnecessary for this.
thanks Dan! Really hope to scale it down to something more manageable on a standard PC.
I hope you are not waiting for this fix. I just reread your earlier posts and am worried you could not get your larger dataset analyzed. Things have gotten busier on my end, and not sure what kind of timeline I can provide on a fix here as I have yet to find time to investigate this.
Hi Dan! Thanks for the reply. Right now I can still remotely use one of the workstation with larger memory to work through some of the bigger cases. But eventually i would need to resolve this issue with a lower memory set type of workstation. TBH, I am very grateful for your help throughout our interaction so far. Dun stress over it, I'll probably have around 1-2 months buffer before I'll need to work on resolving it looking through the C++ source. thanks again! Take care and stay safe!
Sorry for the long delay. I found a few minutes to try this out, but I think it worked for me? I ran it through my debugger (VSCode) and stepped through each line and tried to inspect the memory as I went on the relevant objects. I was using this answer: https://stackoverflow.com/a/30316760/11580262. I didn't notice a huge spike in memory in anything I inspected, and I was able to run it on a Macbook (2018, 16GB memory). I am attaching my output if you wanted to compare.
Let me know if I can do anything else to help diagnose, but at this time I am unaware of where that memory spike you are seeing is coming from. What version of python are you running and what does your environment look like?
Here is my conda environment that I ran it in:
$ conda list
# Name Version Build Channel
astroid 2.4.2 py36_0
bandit 1.6.2 py36_0 conda-forge
brotlipy 0.7.0 py36haf1e3a3_1000
ca-certificates 2020.6.24 0
cachetools 4.1.0 py_1
certifi 2020.6.20 py36_0
cffi 1.14.0 py36hb5b8e2f_0
chardet 3.0.4 py36_1003
confluent-kafka 1.5.0 pypi_0 pypi
cryptography 2.9.2 py36ha12b0ac_0
cycler 0.10.0 pypi_0 pypi
decorator 4.4.2 pypi_0 pypi
entrypoints 0.3 py36_0
flake8 3.7.9 py36_0
ghalton 0.6.2 pypi_0 pypi
gitdb 4.0.2 py_0
gitpython 3.1.1 py_1
google-api-core 1.17.0 py36_0
google-api-python-client 1.9.1 pyh9f0ad1d_1 conda-forge
google-auth 1.17.2 py_0
google-auth-httplib2 0.0.3 py_3 conda-forge
googleapis-common-protos 1.51.0 py36_2
httplib2 0.18.1 pyh9f0ad1d_0 conda-forge
idna 2.10 py_0
isort 4.3.21 py36_0
joblib 0.16.0 pypi_0 pypi
kiwisolver 1.2.0 pypi_0 pypi
lazy-object-proxy 1.4.3 py36h1de35cc_0
libcxx 4.0.1 hcfea43d_1
libcxxabi 4.0.1 hcfea43d_1
libedit 3.1.20181209 hb402a30_0
libffi 3.2.1 h475c297_4
libprotobuf 3.11.4 hd9629dc_0
matplotlib 3.3.0 pypi_0 pypi
mccabe 0.6.1 py36_1
mypy 0.770 py_0
mypy_extensions 0.4.3 py36_0
ncurses 6.1 h0a44026_1
networkx 2.5 pypi_0 pypi
nglpy 1.1.1 dev_0 <develop>
numpy 1.19.1 pypi_0 pypi
openssl 1.1.1g h1de35cc_0
pandas 1.0.5 pypi_0 pypi
pbr 5.4.4 py_0
pillow 7.2.0 pypi_0 pypi
pip 20.1.1 py36_1
plyfile 0.7.2 pypi_0 pypi
protobuf 3.11.4 py36h0a44026_0
psutil 5.7.0 py36h1de35cc_0
pyasn1 0.4.8 py_0
pyasn1-modules 0.2.7 py_0
pycodestyle 2.5.0 py36_0
pycparser 2.20 py_2
pydocstyle 4.0.1 py_0
pydoe 0.3.8 pypi_0 pypi
pyflakes 2.1.1 py36_0
pylama 7.7.1 py_0 conda-forge
pylint 2.5.3 py36_0
pyopenssl 19.1.0 py_1
pyparsing 2.4.7 pypi_0 pypi
pysocks 1.7.1 py36_0
python 3.6.10 h359304d_0
python-dateutil 2.8.1 pypi_0 pypi
pytz 2020.1 py_0
pyyaml 3.13 py36h1de35cc_0
readline 7.0 h1de35cc_5
requests 2.24.0 py_0
rsa 4.0 py_0
samply 0.0.21 pypi_0 pypi
scikit-learn 0.23.2 pypi_0 pypi
scipy 1.5.2 pypi_0 pypi
seaborn 0.10.1 pypi_0 pypi
setuptools 49.2.0 py36_0
simplejson 3.17.0 py36h1de35cc_0
six 1.15.0 py_0
sklearn 0.0 pypi_0 pypi
smmap 3.0.2 py_0
snowballstemmer 2.0.0 py_0
sqlite 3.30.1 ha441bb4_0
stevedore 1.30.1 py_0 conda-forge
structlog 18.2.0 py_0 conda-forge
threadpoolctl 2.1.0 pypi_0 pypi
tk 8.6.8 ha441bb4_0
toml 0.10.0 pyh91ea838_0
toolz 0.10.0 py_0
topopy 1.0.1 dev_0 <develop>
typed-ast 1.4.1 py36h1de35cc_0
typing_extensions 3.7.4.2 py_0
uritemplate 3.0.1 py_0 conda-forge
urllib3 1.25.9 py_0
wheel 0.34.2 py36_0
wrapt 1.11.2 py36h1de35cc_0
xz 5.2.4 h1de35cc_4
yaml 0.1.7 hc338f04_2
yapf 0.30.0 pypi_0 pypi
zlib 1.2.11 h1de35cc_3
A lot of this is cruft from other things.
Hi Dan,
Thanks so much for looking into this. From what I experienced, it seems like Python does manage with the amount of memory it has at its disposal. If the machine has only 16gb, it will use up to the max. But at times, the kernel can be overburdened and the program stops, kernel then undergoes force-restarting. Not sure how it impact mac/windows machines differently.
I tested the same code with r08 input on both machines, one with 32gb and the other with 386gb memory. Attached you can see the task manager output. Once it reaches the point where the morsesmale code starts paritioning, the memory starts to spike up. I'll try to explore how to get the allocated memory size, based on the link you provided.
regards Calvin
Oh wait, I do see a memory spike if I watch the Activity Monitor
on a command line run. I see python jump to near 70 GB of memory. It seems to get released before completing because now it is back down to <300 MB and still running. I might have just inspected the wrong items or at the wrong time.
I am going to try to run a memory profiler, and see if I can get more information as to why the memory spikes so drastically and if this is something that can be avoided.
So, I installed memory-profiler and wrapped your script in a function so I could decorate it with the @profile
(just using memory-profiler without this didn't seem to give any output).
Here are the results:
$ python -m memory_profiler MorseSmale.py
Filename: MorseSmale.py
Line # Mem usage Increment Line Contents
================================================
18 93.8 MiB 93.8 MiB @profile
19 def main():
20 93.8 MiB 0.0 MiB filename_base = "data/r08"
21
22 #just a Helper class to run some functions
23 93.8 MiB 0.0 MiB si = SortIndex()
24
25 # read in the ply object file
26 111.7 MiB 18.0 MiB plydata = PlyData.read(filename_base + '.ply')
27
28 # create an empty edges list array
29 111.7 MiB 0.0 MiB n_edges = np.empty([0, 2], dtype=int)
30
31 # populating the edges list
32 230.8 MiB 0.0 MiB for i in range(len(plydata['face'].data['vertex_indices'])):
33 230.8 MiB 0.1 MiB n_edges = si.output_edges(n_edges, plydata['face'].data['vertex_indices'][i])
34
35 # sort by column 0, then by column 1, easier to remove dulplicate
36 # this only works for closed manifold, might not be useful if the ply mesh has boundaries
37 223.8 MiB 0.0 MiB n_edges = n_edges[ np.lexsort( (n_edges[:, 1] , n_edges[:, 0]) ) ]
38
39 174.3 MiB 0.0 MiB n_edges = si.unique_edges(n_edges)
40
41
42 # printing information relating to edges
43 # for i in range(n_edges.shape[0]):
44 # if n_edges[i][0] != 0 :
45 # break
46 # print(n_edges[i])
47
48 174.3 MiB 0.0 MiB print(n_edges.shape)
49
50
51 174.3 MiB 0.0 MiB edges_tuples = []
52 174.4 MiB 0.0 MiB for i in range(n_edges.shape[0]):
53 174.4 MiB 0.0 MiB edges_tuples += [(n_edges[i][0], n_edges[i][1])]
54
55 152.6 MiB 0.0 MiB print(len(edges_tuples))
56
57 # now we populate the X, the location of the points
58 152.6 MiB 0.0 MiB X = np.empty([0, 3], dtype=float)
59
60 #print(len(plydata.elements[0].data))
61 171.6 MiB 0.0 MiB for i in range(len(plydata.elements[0].data)):
62 171.6 MiB 0.0 MiB pos = []
63 171.6 MiB 0.0 MiB for j in range(3):
64 171.6 MiB 0.0 MiB pos += [plydata.elements[0].data[i][j]]
65
66 171.6 MiB 0.0 MiB X = np.append(X, [pos] , axis = 0)
67
68 #print(type(X[0][0]))
69 #print(X)
70
71 # now we read the file for the K1 and K2
72 151.6 MiB 0.0 MiB k1, k2 = si.read_curvature(filename_base + ".txt")
73 152.1 MiB 0.6 MiB k1= np.array(k1)
74
75
76 #now we try to get the morse smale function
77 152.1 MiB 0.0 MiB graph = ngl.PrebuiltGraph(edges_tuples)
78 152.1 MiB 0.0 MiB msc = topopy.MorseSmaleComplex(graph=graph, gradient='steepest', normalization='feature')
79 189.5 MiB 37.4 MiB msc.build(X, k1)
80 190.2 MiB 0.7 MiB partitions = msc.get_partitions(persistence=0.8)
81
82 190.2 MiB 0.0 MiB f = open(filename_base + '-partitions.txt', 'w')
83
84
85 190.3 MiB 0.0 MiB for k, v in partitions.items():
86 190.3 MiB 0.0 MiB for j in k :
87 190.3 MiB 0.0 MiB f.write(str(j))
88 190.3 MiB 0.0 MiB f.write(" ")
89 190.3 MiB 0.0 MiB f.write("\n");
90
91 190.3 MiB 0.0 MiB for j in v:
92 190.3 MiB 0.0 MiB f.write(str(j))
93 190.3 MiB 0.0 MiB f.write(" ")
94 190.3 MiB 0.0 MiB f.write("\n");
95
96 190.3 MiB 0.0 MiB f.close()
So, it could be something in the C layer that is maybe not visible to this tool? I am using a tool called SWIG to generate a Python interface for the underlying C++ implementation, and I don't know if that would be adding anything. Since the initial construction of this library, I have done some other work where I wrote a more native Python-compatible C layer, so maybe I can excise the need for SWIG and see if that fixes anything 🤷. Before that, I should probably do some Fermi estimations to ensure that we should not expect that much memory usage in the C layer of things given the data structures I used. I haven't looked at that code in a while, but it might be ripe for some optimizations. We should also try to rule out the nglpy
library as a possible source of problems as well. I could try to run your script with just constructing that graph and see if it still uses the memory as well.
It looks like I see the memory spike happen when just using the nglpy library without topopy. I'll start looking at that library first.
thanks Dan! Appreciate your time and effort looking into this issue!
Calvin,
While debugging this code, I noticed something that looks like a possible severe bottleneck in some of the code you posted. You are using np.append
to build your data structures. If you instead switch to just populating a plain python list and then convert the list to an array when it is filled, you can make your script a lot faster.
For instance, I changed this code in SortIndex:
def output_edges(self, current_triangle_list):
a = current_triangle_list[0]
b = current_triangle_list[1]
c = current_triangle_list[2]
full_edge_list = []
if ( a < b ):
full_edge_list.append([a, b])
else:
full_edge_list.append([b, a])
if ( b < c ):
full_edge_list.append([b, c])
else:
full_edge_list.append([c, b])
if ( a < c ):
full_edge_list.append([a, c])
else:
full_edge_list.append([c, a])
return full_edge_list
It could be even more succinct if we used inline if conditionals, but this is only the relevant changes to speed things up.
The relevant call would then change to:
n_edges = []
# populating the edges list
for i in range(len(plydata['face'].data['vertex_indices'])):
n_edges.extend(si.output_edges(plydata['face'].data['vertex_indices'][i]))
n_edges = np.array(n_edges)
With some simplistic timing, I was able to get this code to go from ~370 s down to <4 s. It looks like X
is built in similar fashion.
This does not solve any of our other problems, but could improve turnaround time if you are debugging any of this.
I think I found the problem. It has to do with this idea of a prebuilt graph being very wasteful of memory. I will try to update the nglpy library and publish a new patch version that you can hopefully use.
I have started work on the fix: https://github.com/maljovec/nglpy/pull/19. I think there is more I can do to limit the data footprint since I was being a bit cavalier with the data structures needed. I'll try to get to this later this evening since it is fresh in my mind.
Good news, I got our example down to 22 GB, and there is more clean up work to be done.
thanks Dan! thank you so much for the help!
Okay, I just pushed nglpy 1.1.2. Try updating that dependency and see if it works any better for you. It appears to fix it for me, and also it has made the code a lot faster too! Thanks for pointing this out, I really appreciate the opportunity to improve the performance of these libraries.
Thanks! I am currently testing it with the new dataset, will see if it is possible to work within the confines of a 32gb windows machine. Appreciate it greatly, Dan!
I wanted to create a segmentation algoriothm using the morse-smale complex code. Does the topopy code allows such an implementation?