sumiya11 / Groebner.jl

Groebner bases in (almost) pure Julia
https://sumiya11.github.io/Groebner.jl/
GNU General Public License v2.0
59 stars 13 forks source link

Compute basis in parallel #42

Open iliailmer opened 1 year ago

iliailmer commented 1 year ago

Hello! I was wondering if Groebner.jl currently supports parallel computation for Groebner basis?

sumiya11 commented 1 year ago

Hi, currently there is no support for parallel computation (but it is in Todo list 😄 )

iliailmer commented 1 year ago

@sumiya11 thanks for your answer!

sumiya11 commented 1 year ago

Hi @iliailmer , if you are still interested, there is now pr #46 for parallel gb over rationals, results there look quite promising.

Let me know if you have some specific challenging systems over rationals where you want to compute a Groebner basis. If you have the ability to share them, it could perhaps help me optimize parallel computation a bit

iliailmer commented 1 year ago

@sumiya11 Thank you for letting me know! Here is a link to the gist with 3 systems that are pretty challenging https://gist.github.com/iliailmer/d42db418229ced3510fb78647d475be3

Please let me know if the link works for you.

sumiya11 commented 1 year ago

Thanks very much for sharing! These are over GF, but my multi-threading is designed only over rationals :(

I wonder if for these systems computing the GB of the original problem over rationals would also make sense ?

iliailmer commented 1 year ago

oh, I think it's my mistake, I think I copied the incorrect definition. I will change them

iliailmer commented 1 year ago

@sumiya11 here is the updated link https://gist.github.com/iliailmer/2a8b0a4a689c755a65dba638dc205a49

sumiya11 commented 6 months ago

Hi @iliailmer , Groebner.jl v0.6+ has an option to use multi-threading internally. For now, it is turned off by default, but you can enable it like this:

groebner(sys, threaded=:yes)

Note that it is quite experimental still, you may observe no speedup for smaller problems. :^)

sumiya11 commented 6 months ago

Also, I have a question, is it correct that systems that you have shared above are similar to the ones from the paper "Obtaining weights for Gröbner basis computation in parameter identifiability problems" ? Thanks!

iliailmer commented 6 months ago

Thank you.

Also, I have a question, is it correct that systems that you have shared above are similar to the ones from the paper "Obtaining weights for Gröbner basis computation in parameter identifiability problems" ? Thanks!

Yes, that's right.

sumiya11 commented 6 months ago

Cool. I use the systems that you provided for benchmarking, and I wish to mention this fact in my preprint about Groebner.jl, would it be alright with you?

I have another question: would it be possible to obtain the other systems from that paper, such as HPV, COVID m1, etc? For example, if there is a script that generates these equations given a state-space model, I could try to produce the systems myself. Thanks !

iliailmer commented 6 months ago

@sumiya11 thats fine with me

you can use SIAN in maple or its Julia version to obtain those (the system in the source code of both is called Et_hat)

Edit: there are also files here

Hope this helps!

sumiya11 commented 5 months ago

Thank you a lot! Just in case, I made a bundle of systems from the available examples in SIAN:

https://github.com/sumiya11/Groebner.jl/tree/master/benchmark/generate/benchmark_systems/SIAN

parisseb commented 4 months ago

Is there a limitation to the number of threads? I tried cyclic9 with julia -t16 julia_c9 where the file julia_c9 content is below. The message "[ Info: Using 16 threads." is displayed, but checking with top I only get CPU %800. `using Groebner, AbstractAlgebra, Base.Threads @info "Using $(nthreads()) threads." R, (x1, x2, x3,x4,x5,x6,x7,x8,x9) = QQ["x1","x2","x3","x4","x5","x6","x7","x8","x9"] c9=[x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9, x1x2 + x2x3 + x3x4 + x4x5 + x5*x6 \

Also do you have any idea of the RAM required to run cyclic9 or any Groebner basis? I'm not a julia user, just interested in comparing your code with my own gbasis code inside giac. It seems julia does memory allocation with garbage collecting,therefore it's somewhat hard to know the real memory required, I see with top about 20G of RAM for cyclic9 (RES, 30G VIRT), but it might be an overestimation.

sumiya11 commented 4 months ago

Hi @parisseb ,

There is a limitation on the number of threads: at most 8 threads are used during the first multi-modular batches. Eventually, after several iterations, the batch size increases (so that >8 threads become relevant).

This is done 1) to not overshoot the required number of primes by a lot, and 2) bearing in mind that many threads on a single process may be not very Julia-friendly because of many memory allocations and garbage collection.

Also do you have any idea of the RAM required to run cyclic9 or any Groebner basis?

I also observe maximum RSS about 26 GB for cyclic9. Edit: I use the command Sys.maxrss() in Julia. Though I note that we haven't measured memory usage systematically yet (or optimized for it).

I'm not a julia user, just interested in comparing your code with my own gbasis code inside giac.

That would be very interesting to see! Please do let me know what you find out :-).

Feel free to let me know if you have other questions.

sumiya11 commented 4 months ago

For what it's worth, cyclic9 takes 33 minutes on i9-13900 with 8 threads with Groebner.jl

parisseb commented 4 months ago

My best timing for cyclic9 with giac (on a Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz) is 1300 seconds with 32 threads (16*2 threads for 16 parallel modular computations and 2 threads by modular computation). Total memory used with giac is 8.3G (giac has optimizations for memory usage). Groebner.jl takes 2050s with 8 threads on the same server, and 17.3G of RAM. This is not representative: while RAM usage is in favor of giac in all situtations speed is not, your code is most of the time faster, that's the reason why I'm interested to compare.

I found some tricks that I can probably implement in giac. The first one is what you call probabilistic linear algebra for modular Groebner basis, cyclic9 modulo is more than 2* faster than with linalg=:deterministic with Groebner.jl. The next trick is vectorization for rational Groebner basis computation, by computing 4 primes at the same time, you are taking full advantage of SIMD instructions, without paying too much for memory. I will certainly have a look if I can improve giac speed with SIMD instructions...

If you are interested in giac (e.g. to include it in your very nice automatic benchmarking system), the latest source code is available here https://www-fourier.univ-grenoble-alpes.fr/~parisse/giac/giac-1.9.0.tar.bz2 (I also maintain a deb package frequently and popular Linux distros have also packages but they are sometimes outdated, but compilation of giac should be straightforward). There are some gbasis benchmarks files in examples/groebner, in short the syntax is very similar to (old) maple syntax (with :; instead of : for running a command without display). Multi-threading is controlled by commands like threads:=16; gbasis_simult_primes(8); (means 16 threads are used, 8 parallel modular computations, 16/8=2 threads per prime). You can get intermediate info logging with commands like debug_infolevel:=1 or 2 or ... (or from the shell export GIAC_DEBUG=1).

sumiya11 commented 4 months ago

@parisseb Thanks for the info! I have not used giac before, I will check it out.

My best timing for cyclic9 with giac (on a Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz) is 1300 seconds with 32 threads (16*2 threads for 16 parallel modular computations and 2 threads by modular computation).

Do I understand you correctly that giac computes for different primes in parallel, and also at the same time parallelizes some parts (linear algebra?) for each prime ?

BTW, with 8 threads, Groebner.jl uses batches of size 32 (8 threads * 4 primes at a time), this could maybe explain high memory usage.

The first one is what you call probabilistic linear algebra for modular Groebner basis, cyclic9 modulo is more than 2* faster than with linalg=:deterministic with Groebner.jl.

This is a very cool trick indeed. You can also find a description of this probabilistic linear algebra in Section 2 of "An Algorithm For Splitting Polynomial Systems Based On F4" by Monagan & Pearce.

I might also have a couple of questions about F4 in giac, could you tell me where it is better to ask them ?

parisseb commented 4 months ago

Do I understand you correctly that giac computes for different primes in parallel, and also at the same time parallelizes some parts (linear algebra?) for each prime ?

Yes; that's what I do. Linear algebra is parallelized, this does not cost much additionnal memory (a little bit for preallocation to avoid memory locks between threads).

BTW, with 8 threads, Groebner.jl uses batches of size 32 (8 threads * 4 primes at a time), this could maybe explain high memory usage.

From my understanding, the largest matrix you build for cyclic9 has 100e6 non zero coefficients, and in what you call part A/B of the matrix, there are repetitions, it's mainly in part D where all coefficients are distincts, here you have 20M non zero coefficients at most. That's 160M bytes for 1 prime (4+4 bytes per coeff), or 360M bytes for 4 primes (4+32 bytes per coeff). Add about 400M bytes for the indices of the whole matrix, we should have about 800M for 1 thread for this matrix, say 1G by thread, 8G for 8 threads + additional memory for chinese remaindering and storing the basis, you can probably optimize a little bit memory footprint,perhaps to 10G.

The first one is what you call probabilistic linear algebra for modular Groebner basis, cyclic9 modulo is more than 2* faster than with linalg=:deterministic with Groebner.jl.

This is a very cool trick indeed. You can also find a description of this probabilistic linear algebra in Section 2 of "An Algorithm For Splitting Polynomial Systems Based On F4" by Monagan & Pearce.

This is nice, but it won't help much for rational coefficients, because all blocks reducing to 0 will already be cancelled by learning.

I might also have a couple of questions about F4 in giac, could you tell me where it is better to ask them ?

On Xcas forum https://xcas.univ-grenoble-alpes.fr/forum/, or by email, or we can open an comparison issue here.