CI/CD support discussion

trelau commented 6 years ago

@shrdluk @looooo

Starting an issue to open up the discussion for how to proceed with CI/CD services for pyOCCT.

Some thoughts off the top of my head:

Conda seems to be a natural choice, put having pip support would be a plus also. I collected all necessary binaries into the wheel for my local build and it makes for a nice single file download/install. The challenge there might be collecting all the binaries, but can probably be done via CMake. This includes OCCT, VTK, freetype, freeimage, SMESH, Netgen, etc..
Only the official releases from OpenCASCADE will be used
My latest build uses VTK 8.1.0 w/ freetype 2.8.1
Keeping dependencies free from a specific version of Python would be nice (e.g., VTK and Netgen)
Support for Python 3.5 and beyond should be easy, and would really like Python 2.7 if possible. I'm not concerned about Python 3.0-3.4.
Appveyor and Travis-CI are the most applicable I think
Locally, I build the binaries and then install them into the project "OCCT/" folder, then I use "python setup.py install" or "python setup.py bdist_wheel" as needed. I think the OCCT folder can be used as the target to build the eventual conda package or the wheel and stored as an archive (in Appeveyor for example).
Using third-party packages like conda-forge would be nice, but in case we need our own builds for whatever reason, I think we set up our own conda channel. I think @looooo has already done this, just need to think about what dependencies we need if we can't use conda-forge.
For now I have my own "netgen4smesh" repo, but if we can get the main Netgen project to incorporate our changes and then just have a repo to build it, that would be ideal.

Any feedback and thoughts are welcomed. I have some experience with Appveyor and none with Travis-CI, so in case anyone wants to help out feel free to jump in.

looooo commented 6 years ago

Only the official releases from OpenCASCADE will be used

+1 I would also concentrate on a as simple as possible build-matrix. For FreeCAD I will try to get everything working with occt7.2/py3.6/qt5. So in first place I ignore py3.5 also for conda packages of pyocct. But I guess it shouldn't be a big problem to make them once py3.6 packages are working.

My latest build uses VTK 8.1.0 w/ freetype 2.8.1

There were a problem with the conda-forge vtk-windows package. But a new package was uploaded recently, so everything should work again.

Keeping dependencies free from a specific version of Python would be nice (e.g., VTK and Netgen)

I will not follow this goal with conda-packages. netgen on conda-forge will have the python bindings enabled, but there are packages for py35, py36.

Support for Python 3.5 and beyond should be easy, and would really like Python 2.7 if possible. I'm not concerned about Python 3.0-3.4.

netgen doesn't support py2.7. Also I think it makes most sense to stop supporting py27 with new libraries. py2/py3 incompatibilities is something nobody wants to deal with.

Using third-party packages like conda-forge would be nice, but in case we need our own builds for whatever reason, I think we set up our own conda channel. I think @looooo has already done this, just need to think about what dependencies we need if we can't use conda-forge

In the long run I would like to get as many packages included into conda-forge as possible.

For now I have my own "netgen4smesh" repo, but if we can get the main Netgen project to incorporate our changes and then just have a repo to build it, that would be ideal.

I think there is not much difference to the main netgen repo. The salome-patch was not yet included and is not added to the conda-forge package. The conda-forge netgen package will use releases/tags from the netgen repo. If we need any diff included in this package we can simple upload a local build with modified sources to the cad channel.

Regarding usage of ci's for pyocct: travis seems to have some build-time limitations. So I guess it won't be possible to use a typical conda-forge setup. But it should be possible to build linux and osx packages on circle-ci. How to setup the ci's with tokens and so on, I don't know. But it would be nice to do this independently from conda-forge. But until everything is working, I think it's best to upload local builds, as this is far more flexible.

trelau commented 6 years ago

Sounds good. I'm going to experiment with conda builds locally to better understand the process.

So in the near term, we can use our own channel for dependencies but perhaps eventually migrate to a conda-forge version if they pick them up, I think is what you're saying. I suppose it makes sense to keep the Freecad/pyOCCT dependencies in sync too.

shrdluk commented 6 years ago

I've been experimenting with Travis-CI, and have managed to generate a conda-build package for part of the project. One issue, as @looooo points out, is the build time limitation. I can build around 30-40% of pyOCCT before the timeout hammer descends.

On a local computer, it's usual to have a build directory which stores compiled object files. When a change is made to a source file, the build system (cmake/make) ensures that only the affected files are recompiled, so the project can often be rebuilt in a matter of seconds. On a remote system like Travis-CI, the situation is very different since no state is stored. First the virtual machine has to be initialised and have its operating system, compiler, dependencies, etc installed, which I've found takes around 15 minutes of the 50 minutes allowed for the job. Then the entire project, possibly consisting of thousands of files, has to be rebuilt from scratch.

I've been looking into the possibilities of caching a build directory in Travis-CI to speed up the build process and hopefully allow pyOCCT to be built in its entirety. This is still very much work in progress. The actual caching works well, and file timestamps are preserved, which is essential for make to discover changed files. Unfortunately the timestamps of files retrieved by git clone are set to the current time, which would cause make to rebuild everything. Various scripts have been published to correct the timestamps, and this is the next thing I intend to try.

Assuming I can persuade make to build only changed files, there is still an issue. When a build ends normally, Travis-CI automatically stores the changed cached build directory. However, in the case of a timeout, everything stops abruptly and the cache is not saved. So all the hard work that had been done in partly building the project would be wasted. What is needed is a "time-aware make", which only starts to compile and link a file if there's enough time to complete it. I'll see if I can come up with a simple Python script to check the time and invoke make on individual subdirectories.

If all of this works, it should be possible to build the entire project over the course of several jobs. Of course most commits would change only a few files and the project would be rebuilt in a single run, faster than with a non-cached approach.

The build caching technique might be useful on a range of projects, so I'll follow it through and see if I can get it to work.

Whether we would actually want to use it for pyOCCT is another question. Any comments would be welcome.

looooo commented 6 years ago

just a short side-note: netgen is now included in conda-forge. I also found some docs for the output section of the meta file. Maybe this way we could divorce the python dependent part of the package. But I still don't know if it's worth the work...

trelau commented 6 years ago

@looooo great work! I don't mind making Netgen Python dependent as long as we have Python versions to support pyOCCT. Others may find the Python interface useful as a standalone tool. I wouldn't spend time on the output section. Perhaps you could make an announcement on the Netgen message board to make others aware of your results?

Is it using the Netgen master or a specific tag? What does "6.2.1802" refer to? Is there any value in adding win32 builds (assuming it is a trivial request)?

Should we add another patch for Salome/SMESH? Long term it'd be nice to get that merged into Netgen itself but until then a patch might suffice. I know the patch fixed some issues I was experiencing during use. I think these two commits are the relevant ones:

I guess getting SMESH available is next step towards getting a full build for pyOCCT? Long term what do you think about having pyOCCT on conda-forge?

Besides SMESH, are there any dependencies that we are missing, out of sync, or not on conda-forge (OCCT, VTK, etc.)?

@shrdluk Would circle-ci provide an alternative approach? I don't have any experience with either, but I guess it comes down to the effort for what you suggested vs implementing circle-ci. Sorry for your troubles, but thanks for your efforts.

looooo commented 6 years ago

Is it using the Netgen master or a specific tag? What does "6.2.1802" refer to? Is there any value in adding win32 builds (assuming it is a trivial request)?

they made this release: https://github.com/NGSolve/netgen/releases win32: https://github.com/NGSolve/netgen/issues/12#issuecomment-375614659 I don't care much about win 32 bit. I would exclude it from the build-matrix in first place as it is already difficult enough to handle 3 plattforms / 2-3 python-versions.

Should we add another patch for Salome/SMESH? Long term it'd be nice to get that merged into Netgen itself but until then a patch might suffice. I know the patch fixed some issues I was experiencing during use. I think these two commits are the relevant ones:

I hope it gets included. https://github.com/NGSolve/netgen/pull/10 (I guess it includes everything needed. Maybe you can have a look) But I don't think it's best practice to add the patch to conda-forge. If we really need it, simple pull conda-forge-netgen-feedstock, choose other sources and push to cad-channel.

Would circle-ci provide an alternative approach? I don't have any experience with either, but I guess it comes down to the effort for what you suggested vs implementing circle-ci. Sorry for your troubles, but thanks for your efforts.

I guess circle ci is imitated to 2 hours build time. There were some attempts to move the occt-feedstock to circle-ci because currently manual builds (osx) are necessary. But no success yet.

trelau commented 6 years ago

Ah so you've already made an attempt at win32. Seems we leave Netgen out for win32, no problem. The pull request looks good.

looooo commented 6 years ago

libmed is next on my list. https://github.com/conda-forge/staged-recipes/pull/5588#issuecomment-379263371 Are there any official sources for salome-smesh? Can you explain what the differences between smesh4netgen and the salome sources are? I guess it could be difficult to get smesh into conda-forge, because we are using not the salome-sources.

trelau commented 6 years ago

I started by grabbing the official Salome sources from www.salome-platform.org/downloads/current-version (at the time was 8.3.0)

It is quite a process to get the stand-alone version of SMESH going as you can see by the commits: https://github.com/LaughlinResearch/SMESH/commits/master

I doubt we'll be able to pull the official sources, patch, and build, rather point conda-forge to my SMESH repo? Or perhaps that is a package better suited for a designated channel? Most of the changes fall into two categories: 1) Changes to get stand-alone SMESH to work and 2) Changes to get Netgen 6.2 to work since official project currently supports Netgen 5.3 (maybe I can make them aware of stand-alone effort and they'll get caught up?)

looooo commented 6 years ago

Smesh as a stand-alone is definetly the best option. Also freecad would benefit alot if there would be an official smesh-repo. So I guess it's best to see salome4netgen as a official fork. Maybe we can rename the library so it's clear that it is a fork of the salome project.

maybe I can make them aware of stand-alone effort and they'll get caught up?

I guess the whole open-source cad world would like this to happen. ; ) This would be a great acchievment and also salome can benefit a lot. (more testing of there software, more people involved, ...)

Once libmed is included in conda-forge, I will try to add smesh4netgen. But before this I would like to make FreeCAD work with smesh4netgen, and there I still have big problems. (somehow I need debug builds of all the libraries, but smehow making a debug build of netgen doesn't work with gcc4.8 ...)

trelau commented 6 years ago

Perhaps I missed something, but what is the salome4netgen you are referring to? Do you mean my LaughlinResearch/SMESH repo? Did you have a new name in mind?

We'll see if this gets any attention: http://www.salome-platform.org/forum/forum_12/821372925

looooo commented 6 years ago

ah yes I meant LaughlinResearch/SMESH. It's the other way : netgen4smesh.

SMESH_WITHOU_SALOME -> SMESH_minus_S -> SMESH-S or SIMPLE_SMESH

shrdluk commented 6 years ago

I've been experimenting with CircleCI and can now automatically build part of pyOCCT and upload the resulting conda package to Anaconda.

There's a snag though. Some of the modules (AIS and Graphic3d so far, though there may be others) cause the build to fail due to an internal compiler error, which is caused by insufficient RAM. I had previously noticed that Graphic3d needs over 3.5GB during compilation, and the default CircleCI containers have only 4GB.

My previous experiments with Travis-CI didn't run into a memory problem, since I was using their "sudo enabled" configuration which has 7.5GB.

Any ideas on reducing memory use?

There are GCC flags to limit how much memory it uses. I'll look into this.

CircleCI have configurable resources which allow more RAM, but they want an additional payment. But since pyOCCT is Open Source, perhaps if someone spoke to them nicely...

Apart from the RAM issue, building on CircleCI seems to have potential. I've built a custom Docker image which is stored on Docker Hub. Despite its size (750MB), CircleCI loads it and spins up the environment in less than 1 minute (I guess they aren't using dialup :smiley:). This is a lot faster than installing everything as part of the build, which I previously found took 15 minutes on Travis-CI.

I've never used Docker before, but I'm impressed. Docker images also allow debugging on the local computer, which is faster and more convenient.

I haven't built the full project yet, but I estimate it would take around 4 hours. Since CircleCI allow Open Source projects to use 4 containers, that would be around one hour if the build could be fully parallelised. CircleCI allow 1500 minutes per month, which doesn't seem to be relaxed for Open Source projects. Assuming these are "container-hours", this would allow about one full build per week. Not exactly Continuous Integration...

I've yet to look into build directory caching on CircleCI.

shrdluk commented 6 years ago

Travis CI can also use Docker images, which it runs in its sudo-enabled Trusty environment with 7.5GB of RAM.

The arrangement is a bit different from CircleCI. Travis launches a full VM running Ubuntu Trusty 14.04, which in turn runs the Docker program which manages Docker containers. A box within a box.

As a trial, I used a Docker image (the same one I used for CircleCI) in Travis. Start up time is a little longer than with CircleCI, but initialisation of the VM, loading the Docker image and spinning up the container still took less than 3 minutes, which is a lot quicker than the 15 minutes it took in my earlier experiments where I installed everything directly into the VM.

With this configuration, a subset of pyOCCT files, including AIS and Graphic3d which broke the build on CircleCI, compiled without errors. The summary for the run is interesting:

Total time: 0:19:10.6
CPU usage: sys=0:00:34.3, user=0:23:11.3
Maximum memory usage observed: 5.6G
Total disk usage observed (not including envs): 730.7K

So it looks as if CircleCI, at least in its default 4GB configuration, is ruled out unless memory use during compilation can be reduced considerably.

trelau commented 6 years ago

@shrdluk wow interesting stuff and good work. so this seems like a feasible path forward for pyOCCT Travis-CI based builds, yes?

trelau commented 6 years ago

I added a few conda builds for Windows on my Anaconda channel. I tried to set up Appveyor but hit the 60 minute limit at only a third of the way through...not sure how to handle that...or if it will even be possible given the build times...

shrdluk commented 6 years ago

Sounds like Appveyor timing is similar to Travis CI. I found I could build about 30-40% of pyOCCT in the 50 minutes allowed.

I've made some progress with build directory caching on Travis CI. As I mentioned above, this would allow the project to be built from scratch over a few runs, and rebuilds due to small changes would be very quick. My timestamp restoration script works nicely, and make seems to respect the times and only rebuild changed files. Unfortunately CMake isn't playing ball at the moment and wants to rebuild everything.

I've also thought of a simple way to implement the "time-aware make" I mentioned above - just use timeout to kill the build after 45 minutes if it hasn't finished. Travis CI will then save the cached build directory so a future run can pick up where this one left off.

I've been busy with other things, but I hope to be able to get back to this in a day or two and see if I can get CMake to play nicely.

I don't know anything about Appveyor, but if build caching can be made to work with Travis CI, it might be possible to implement something similar on Appveyor.

trelau commented 6 years ago

@shrdluk I know Appveyor has some kind of caching capability so a similar approach may be possible. I emailed Appveyor support to see if they ever allow increases in time. I probably won't even hear back, but if nothing else maybe they give us some hints on how to use the cache.

trelau commented 6 years ago

@shrdluk to my surprise Appveyor responded and increased the build time for the project to 3 hours! They also offered to maybe even provide some help with caching.

Edit: Python 3.5 was successful but 3.6 failed with "compiler out of heap space" https://ci.appveyor.com/project/LaughlinResearch/pyocct/build/0.0.1.18/job/sscmu4vfywk242eu#L1336

This seems to be an issue with not enough memory on the VM. I added the error in an email to Appveyor, perhaps they will have some ideas.

shrdluk commented 6 years ago

Excellent news about the increased build time!

Looks as if Graphic3d caused the memory overflow. I found that was the most memory-hungry of the modules.

How much RAM does the Appveyor VM give you? I found the 4GB of CircleCI wasn't enough, but the 7.5GB of Travis CI was OK.

It's surprising that a change from Python 3.5 to 3.6 uses enough extra memory to trigger the overflow. The build must be right on the memory limit.

It looks like Ninja is using two threads, which will share RAM. The timings on each thread may not be entirely deterministic, so on one build you could get two memory-intensive activities running together and causing an overflow, while on another they may not coincide so the build succeeds. This may be an alternative explanation to Python 3.5 to 3.6 causing the memory problem.

If Appveyor can't increase the VM RAM, it might be possible to split the build so that the memory-hungry modules such as Graphic3d are run in a single thread, while the other modules are run in two threads as at present. This will increase the build time slightly and will be a bit fiddly to set up.

Good progress though!

trelau commented 6 years ago

Here are the Appveyor build configurations: https://www.appveyor.com/docs/build-environment/#build-vm-configurations

Maybe I need to figure out how to use the GCE with 7.5 GB. (but I see now it isn't free).

I'll wait and see if Appveyor reponds again with any ideas regarding the memory and build cache, if they can't do anything on their end then I guess think about how to split things up...

shrdluk commented 6 years ago

The Appveyor configurations look similar to Travis CI, who offer 4GB RAM on AWS or 7.5GB RAM on GCE. Probably the same machines :smirk: . Travis allow you to choose which configuration to run on with an entry in the .yml file, and don't make any charge for using the one with more RAM.

I've found that pyOCCT builds happily on the Travis GCE 7.5GB VM, even with two threads/cores running, so if you can get Appveyor to let you use their GCE VM that would be the simplest solution. Since they have been so accommodating over the build time limit, they may consider a request favourably.

trelau commented 6 years ago

@shrdluk ya i'll see if they can do anything. if you have a branch with travis ci working (even if it hits the time limit), we can work towards getting it pulled into the master branch i suppose. then I/we can reach out to Travis and see if they can do anything about the time limit. At least get it all set up to show we have a working configuration, just need some relaxation on the timeout or in Appveyor's case the memory. And then maybe some help with the cache if they have some ideas and are really nice.

Note that I removed the "PYTHON" section of CMakeLists. Apparently when you add or find pybind11 it takes care of finding the Python libraries and include directories. There is a ci/conda folder too where you could add the build.sh for Travis-CI. In the next day or two I'll add to the conda build process to run the unit tests also.

shrdluk commented 6 years ago

I've set up an experimental branch ctest where I'm investigating clang as an alternative compiler to gcc on Travis CI. Some reports say clang uses less memory than gcc; others say they are comparable. I guess it depends on what you're compiling.

I had some memory overflows on stbuild (the subject of the current pull request), so I set the -j2 option on ninja to prevent it being quite so optimistic. It prefers to run 3 edges even on a 2 core machine, which usually makes sense since I/O will block some of the time. On very memory-intensive builds such as pyOCCT, this may not a good idea though.

In ctest, I'm letting ninja have its head to see if either compiler runs out of memory. For a fair comparison I'm using gcc 4.9, since, like clang, it supports C++14. (stbuild uses the default gcc 4.8, which is C++11 only).

Result of the first runs are in: both compilers completed successfully. Clang was quite a bit faster, taking 16 minutes, while gcc took 20 minutes. In each case, setup took nearly 6 minutes out of those times before compilation actually started.

More data is required, but the initial result shows clang is a promising contender.

shrdluk commented 6 years ago

I've realised I'm not comparing like with like. Clang isn't doing LTO, which apparently makes a significant difference to compile time and memory use.

So I need to work out how to enable that...

trelau commented 6 years ago

@looooo @shrdluk so i think we're in decent shape for CI services (excluding the build times). Thank you both for your efforts. Appveyor was nice enough to increase the build time for this project, so we can build Windows and I'll hopefully have some time in the near future to experiment with some "cached" solutions. I reached out to Travis-CI to see if they could extend the build time as well. If that happens we'll try and build the full project for both services. If they build, then we can start playing around with cached solutions, hopefully.

For Appevyor, I'll probably make a small test project, or a branch with just building a few test modules, and see if caching the build directory gets me anywhere.

shrdluk commented 6 years ago

I've now got my build caching solution working on Travis CI. Further checks and tweaks are still needed, but it's looking good. If no dependencies have changed, nothing is done. A change to a single .cpp file causes just that file to be compiled and linked. I've just tried pushing a change to one file and the whole build took under 5 minutes, most of which was setup. So our average load on Travis CI should be low. Those sort of times also fit the idea of Continuous Integration.

Once I've got a few more things sorted out, I'll try building the whole project on Travis CI over a number of runs. That will give us an idea of how much time a full build will require.

Changes sufficient to require a full build from scratch will hopefully be rare, so we could treat that event as an overnight build. Instead of needing a single run of several hours, we could schedule (using cron?) a number of 50 minute runs spaced throughout the night to avoid hogging resources.

trelau commented 6 years ago

@shrdluk Travis-CI support did respond but it wasn't clear if they were going to provide 180 min of build time. I responded asking for that so hopefully we can get the initial builds easier.

Update: Travis-CI approved 180 min for LaughlinResearch/pyOCCT!

shrdluk commented 6 years ago

Congratulations on the 180 minute allowance. You clearly have good negotiating skills :+1:

Now that build caching is working, I've added automatic scheduling of continuation builds using the Travis CI API. The Cunning Plan is that the Ninja build command is wrapped in a timeout call which kills it if time is running short. If this happens, a request for a further (continuation) build is scheduled using the API. The build then fails, but all the compiled and linked files it generated are safely stored in the cache by Travis. When the next scheduled build runs, it picks up where the last one left off, so the project is built incrementally over a number of runs. Eventually all the modules are compiled and linked, so the build doesn't time out and the installation and deployment steps are carried out.

There is a build sequence in progress at the moment.

The reason for using timeout to generate a deliberate fail is to ensure that Travis saves the cache. If a build hits the 50 minute mark it is terminated ungraciously and changes to the cache are lost.

If the build fails for any reason other than a timeout, no further requests are scheduled. It won't therefore keep redoing failed builds.

The timeout period is currently to 40 minutes to allow some leeway, but this can be adjusted to make use of the 180 minute allowance. The overhead of several short runs compared with one long one isn't too high though; setup usually takes under 3 minutes.

The API looks useful and there may be other things we can do with it later.

trelau commented 6 years ago

Wow that’s impressive. If this approach works well, then we may want to anticipate only 50 minutes since I’m not sure the 180 minute bump will last forever. Although, will these CI services rather have just run one long run or multiple short ones haha.

shrdluk commented 6 years ago

Running several 50 minute builds will be less efficient than fewer 180 minute ones due to the overhead of spinning up and taking down the VMs. On the other hand, load management is probably easier with shorter processes. Swings and roundabouts.

I don't see us needing full builds very often, at least once the process settles down. Most changes will be small, and in those cases the project will be rebuilt very quickly.

It may be possible to run builds in parallel, though this would need some modifications to avoid one run clobbering the output from another and possible cache corruption. Probably not worth considering.

trelau commented 6 years ago

Agreed, hopefully full builds will be few and far between.

shrdluk commented 6 years ago

The full build has just completed and automatically invoked conda-build to build and upload the pyOCCT package to my Anaconda channel. There are a few bugs to iron out, but the system is basically working.

trelau commented 6 years ago

I wonder if more explicit linking of the OpenCASCADE/SMESH libraries would reduce memory use and decrease build time? Right now every module is linked to every OCC library for convenience...

shrdluk commented 6 years ago

I don't have any hard figures on this, but just from watching the build I'd say the times for building (compilation) and linking are broadly similar, so reducing the link time would be beneficial. Peak memory use seems to be much higher during compilation though, so it may not help here.

It would certainly be interesting to try a comparison between full and selective linking on one file, possibly Graphic3d since it's the Big Beast. On Linux, the time command gives information about cpu time, memory use etc and would be useful for gathering statistics. I'm sure there's something similar on Windows.

I wonder if there are other tricks which could reduce build time and memory use. Although the approaches are very different, building pythonocc-core must involve processing a similar amount of information and can apparently be done in 5 to 15 minutes. Perhaps the difference is inherent in the way pybind11 works.

trelau commented 6 years ago

Interesting, I may try some experiments locally on windows and see if anything jumps out at me. I think pybind11 uses Link Time Code Generation which probably has an impact. I asked the pybind11 chat if anyone had any tips for speeding up builds and decreasing memory, even if that means trading off binary size or (a little) performance. I wonder how pythonocc/SWIG would do with all the new templates in OCC 7+ (I don't think OCC < 7 used many templates)? All these OCC templates plus pybind11 templates perhaps puts quite the strain the compilers.

shrdluk commented 6 years ago

That's a very good point. I can imagine templates expanding templates which expand further templates...

That's not a can of worms I want to open though. I'll leave that to people who know something about C++ :smile:

trelau commented 6 years ago

I tried building Graphic3d and for that package you end up linking most the libs because it depends on so much. I noticed the compiler is eating up most the resources (~2.5GB RAM) while the linking is ~500MB. Linking takes more time but doesn't eat as much memory it seems.

If only we could get more memory on the VM's...

It seems Appveyor is having a higher success rate than Travis, but I thought the VM's on Travis had more RAM? I wonder if there is a setting we're missing?

trelau commented 6 years ago

btw I tried playing around with the Appveyor cache and got nowhere 😒 https://github.com/trelau/cache_builds i think the conda build process is making it harder than it should be with it's creation of fresh environments for the build...

shrdluk commented 6 years ago

If only we could get more memory on the VM's...

That's a tricky one, because it's a hard limit. The CI folk can increase time limits for us, but the memory is determined by AWS, GCE or whoever owns the VM hardware they use.

Since it's just a few files which cause problems, I wonder if it's possible to split them into smaller pieces, compile each piece separately to produce .o files, then link all the .o files together. Messy, but it might be one way round the limit.

It seems Appveyor is having a higher success rate than Travis, but I thought the VM's on Travis had more RAM? I wonder if there is a setting we're missing?

How many edges/threads/cores does Ninja use on Appveyor? On Travis, I found that Ninja was very optimistic about how many things it could do at once, and tended to run out of memory. It also depends on which files happen to be processed at the same time. If a couple of memory-hungry ones coincide, it can hit the memory limit and kill the build. Of course, it's possible to limit the number of edges Ninja uses, but this will increase build time.

The compiler used also makes a difference. I tried using Anaconda's compiler and build system (gcc 7.2) and the builds always ran out of memory. This may partly be due to the compiler using more memory (to do cleverer things :smirk:) and partly because other parts of the build system are loaded into memory and take up space.

i think the conda build process is making it harder than it should be with it's creation of fresh environments for the build...

I agree. I found it impossible to use conda-build directly due to the changing environments. My build caching works by using conda-build in a very, erm, unconventional way. I build everything using fixed source and build directories, and finally call conda-build to copy the resulting OCCT directory into its own workspace. It complains like crazy when it tries to build the package, because everything is in the wrong place. Despite this, it does the right thing and produces a perfectly good package :smiley:

If you're interested in the gory details, see my trbuild branch.

The fresh environments are good if you're building lots of packages on the same machine at the same time, since it isolates them from each other. If you're building on a VM, it's all rather unnecessary though.

Conda-build appears to have an environment variable which sets the working directory, so it may be possible to hack this so it uses fixed locations. That's another can of worms I don't want to go near though :smile:

trelau commented 6 years ago

cool i'll check out your branch. i thought i was close by setting the --croot and --dirty options and then try and cache the work\build directory...but i wasn't sure it was going anywhere...i'll take another crack at it when my patience is back haha. maybe i just need to cache the whole conda-build directory and then do a git pull wherever it stores the repo and then just a conda build...but i couldn't figure out how to tell conda to just build given a directory...it seems hardcoded to create environments/download sources/build/test and then on top of all that wipe out everything afterwards...

trelau commented 6 years ago

Would this help or do other compilers have something similar? http://clang.llvm.org/docs/ThinLTO.html

There is an option when adding a module to specify thin LTO.

shrdluk commented 6 years ago

ThinLTO looks interesting, and should reduce time and memory use during the link phase. I don't think it makes a significant difference to the compile phase though, and that seems to be the part that's causing us most problems.

ThinLTO may introduce other issues such as increased program size. From that thread:

... ThinLTO doesn't optimize for program size as much as full-program LTO does.

ThinLTO is still fairly new, and these issues should be sorted out in time.

I experimented briefly with clang, as I mentioned above, but I had problems with getting it to use the right linker and put it on hold. If you can get clang to work on Windows without too much effort, it would be interesting to try a comparison.

trelau commented 6 years ago

@shrdluk ah yes I keep forgetting that compilation is the main issue and not linking. Perhaps we table ThinLTO for now. I experiment with cache builds on Appveyor but didn't have much luck. I'll circle back when I regain my patience...

It seems our basic Appveyor and Travis-CI processes are solid, it's just the time and memory limits that we are constrained by, which is unfortunate....

trelau commented 6 years ago

@shrdluk @looooo the module Graphic3d has been broken into two files and all CI services seem to build in about two hours for each configuration. Good enough for now, but for OCCT 7.3.0 I'm actually going to break things out into much smaller files. I think this will actually speed up compilation and make navigating the source file easier.

trelau commented 6 years ago

@shrdluk @looooo Linux builds should be available if you want to try them out: https://anaconda.org/trelau/pyocct/files

shrdluk commented 6 years ago

the module Graphic3d has been broken into two files and all CI services seem to build in about two hours for each configuration

That's really good. It should also allow the project to be built on CircleCI which uses VMs with 4GB memory. Conda-forge use CircleCI for Linux builds, so this will make it easier for them to pick up the project in the future if desired.

looooo commented 6 years ago

Conda-forge use CircleCI for Linux builds

we are now using circle-ci also for osx (eg.: vtk, occt) as this allows 2 hours of build-time. I guess for pyocct we would do the same.

@trelau Do you think it's already time to get pyocct to conda-forge? Are you building already with the smesh package I uploaded to conda-forge? Have you tested the netgen-patch which is applied to conda-forge netgen package?

trelau commented 6 years ago

@looooo I think it's a good time to try conda-forge. I forked the staged-recipes project to get started and will probably create a beta release of pyOCCT. I am using the SMESH/Netgen from conda-forge so I'll test those along the way.

trelau / pyOCCT

CI/CD support discussion #6