lantonov / asmFish

A continuation of the nice project asmFish by Mohammed Li. Latest version: 07.08.2019
https://lantonov.github.io/asmFish/
Other
118 stars 49 forks source link

Is Asmfish team still working? #98

Closed Ebola-Chan-bot closed 6 years ago

Ebola-Chan-bot commented 6 years ago

I haven't seen any updates for so long. Are the developers still working? Or they switched to a new site other than GitHub?

lantonov commented 6 years ago

I am still here, but @tthsqe12 who does the serious stuff is missing for more than 2 months. I have difficulties with the next SF patch and any help would be welcome.

0zymandia2 commented 6 years ago

He's very active on a C++ project, which might indicate that his current workload is very heavy on assembly and he wants to "detox". (Really reading between lines, I have no idea).

lantonov commented 6 years ago

Thanks for the info @0zymandia2. Good to know that he is still with us.

tthsqe12 commented 6 years ago

Indeed, asmfish team is working but not on asmfish. lantonov, you might appreciate the cas https://raw.githubusercontent.com/tthsqe12/cas/master/test.bmp The windows exe in there might even work on your machine.

Ebola-Chan-bot commented 6 years ago

@tthsqe12 Awesome project, though I have no idea what it means.😵Are you giving yourself a holiday or have you given up on asmfish?

0zymandia2 commented 6 years ago

asmfish team is working but not on asmfish

Nice catch.

tthsqe12 commented 6 years ago

I will slowly catch up with whatever the sf team has done with the code since last year. I am more interested in other projects now. What is the general progress on NN for chess?

Ipmanchess commented 6 years ago

Then you have to check here! http://www.talkchess.com/forum/viewtopic.php?t=66280 The goal of the project is a distributed training project for the network weights, hopefully building a strong chess AI from scratch. I haven't had time to set up the training server yet. It's getting close though Smile. If anyone wants to work on it as well, please let me know! It's exciting to see a totally different method of search/evaluation be competitive, and we need a public version of this.

Gary is the person who made fishtest!

0zymandia2 commented 6 years ago

What is the general progress on NN for chess?

More closely related to SF, there's this fork.

And here, the Fishcooking thread, where some broad strokes are outlined.

lantonov commented 6 years ago

Thanks God, finally! asmFish is not dead! About CAS, as a mathematician/statistician I am very much interested in it. I used to work with Mathematica, Maple, and Matlab in my Ph.D. student years and afterwards when I did some more theoretical work. Now, with more practical tasks, I use mainly R and Python for numerical calculations and rarely turn to symbolics. I am sure that your algebra system, coming from you, would be something extraordinary. About NN chess, with the recent AlphaZero bomb and the aftermath, it became too much for me. Everybody writes and talks about it. Doubtless, this is a significant event in our CC community. The thing that bothers me, though, is the large computer power needed. Not having supercomputers with TPUs, the direction we are going is towards distributed computing combining both projects of a year ago into one. Interesting times. P.S. On first look, this is very similar to Mathematica

0zymandia2 commented 6 years ago

The thing that bothers me, though, is the large computer power needed.

For NN, that may be right, but if we take a more modest approach and incorporate self-learning techniques, we might get something useful for everyday PCs.

lantonov commented 6 years ago

Unfortunately, casw.exe requires various dll files which I don't have on my system. Tried to compile it with mingw but the include files are too different.

lantonov commented 6 years ago

Just for information, I have done the 2 SF simplifications of Nov 8 for asm (bench problem for arm) but had problem (unequal bench) for Nov 10 - Capture Stat Simplification 87452f3 - see #93

Ebola-Chan-bot commented 6 years ago

Why not wait for Intel Nervana? The specially designed chip works better for NN works, and may well send distributed learning to history.

lantonov commented 6 years ago

Amazing! A program of only 453 kb able to do so many things! I compiled it by the instructions with mingw and this is the result from the examples given: image

lantonov commented 6 years ago

N[Sqrt[-1]] = 3.6556564*10^-26 ?

tthsqe12 commented 6 years ago

Im still contemplating what to do about complex numbers. Its not much good for real computations yet. Im focusing on user interface at the moment

lantonov commented 6 years ago

CAS can get real roots right while complex ones are off the mark, though not drastically so. For example, this one Solve[3x^3-2x^2+x-4. == 0, x] in CAS is {{x->1.25155058}, {x->-0.20719431}, {x->-0.20719431}} and in Mathematica is {{x->1.25155}, {x->-0.292442-0.98986i}, {x->-0.292442+0.98986i}}

tthsqe12 commented 6 years ago

The entirety of the algebra engine is contained in 4000 LOC and is thus quite limited. The actual design of the algebra engine is not settled. For example, there is no global table where you can store variables or function. I have no idea how I want this to be designed. The only thing I know is that I do not want to reproduce this aspect of the disaster that is MMA.

lantonov commented 6 years ago

x^5+20x^3+20x^2+30*x+10 = 0 is a quintic solvable in radicals. CAS crashes on it as well as on other solvable quintics

tthsqe12 commented 6 years ago

x^5+20x^3+20x^2+30*x+10 = 0 is a quintic solvable in radicals. CAS crashes on it as well as on other solvable quintics

The entirety of the algebra engine is contained in 4000 LOC. I only added the cubic solver, which is broken in some cases, to test the display of big expressions.

I could add a solvability tester for polynomials over Q[x] of degree up to and including 9. However, one would need a factorization algorithm for Q[x]. Would you mind if the 400KB cas uses the 30MB flint library for such computations and the 30MB arb library for complex arithmetic?

lantonov commented 6 years ago

I know, everything is a compromise between generality and simplicity. Mathematica is several gigas with no much bloat (if you exclude the specialised packages like image and sound processing, geography stuff, etc.). Still, it's lacking much like symbolic tensor algebra, many aspects of abstract algebra and topology.

lantonov commented 6 years ago

I couldn't find any problems in the differentiation, as long as we are limited to school taught functions (polynom, trig, and log)

asmfishcc commented 6 years ago

Should we wait for the executables, or will they no longer be compiled? Thanks!

lantonov commented 6 years ago

I will always compile executables if there is a change in them

tthsqe12 commented 6 years ago

@lantonov , 20GB is massive and I would refuse to install anything that big. I know some would have problems with such a huge footprint. While I think the latest is around 20GB installed, I am finding that the very core part of my installation totals around 100MB, and is centered around \SystemFiles\Kernel\Binaries\Windows-x86-64\mathdll.dll, \SystemFiles\Kernel\SystemResources\Windows-x86-64\*.mx, The mathdll (36MB) houses the kernel, and the *.mx files (64MB) contained in the latter directory are loaded on demand so you don't have the massive code for solving PDE's loaded when you are just trying to figure out what 2+2 is. For example, if you delete the "Integrate" folder (oops - maybe rename it instead), the integral Integrate[x^2/Sqrt[1-x^2],x] can still be done with error messages, but Integrate[x^3/Sqrt[1-x^3],x] only gets half of the way. Similarly, Sum[1/(n^3+n),{n,1,100}] is done in the kernel directly but Sum[1/(n^3+n),{n,1,Infinity}] requires something from the Sum directory to be loaded.

These .mx files are seriously compressed binary files that communicate with the kernel directly on a binary level. They are the closest to what you might call mathematica machine language. While being far from human readable, they do admit of decoding for easy viewing. The fact that some integrals are done directly in the kernel is probably a remnant of the fact that version 2.0 had Integrate but DumpSave was only introduced in 3.0.

As for the kernel, it seems to ship with both 32bit and 64bit binaries, so you should really count about half whatever size your kernel directory is. Besides the double binary dump for win32 and win64, there are tonnes of extra dll's in \SystemFiles\Kernel\Binaries\Windows-x86-64\ from intel just in case you might have a P4.

So, in conclusion, it could be a lot smaller, but it tries to do everything, and the developers have been blown around by the whims of the customers wondering what direction the wind is blowing.

lantonov commented 6 years ago

That's what I was trying to say, the core Mathematica is tight and the bloat is either non-mathematical (linguistics, biology, geography, culture, etc.) or applied (physics, engineering, chemistry).

tthsqe12 commented 6 years ago

@lantonov Do you have access to a machine with mathematica and > 16GB of ram?

lantonov commented 6 years ago

Unfortunately, no. At work, the comp has 32GB ram but no mathematica. Laptop at home is 8GB.

tthsqe12 commented 6 years ago

Do you know anyone who does? I have a fairly simple test request.

lantonov commented 6 years ago

Ipman has some powerful machines but not sure if he has mathematica. Also, I have a friend who has mathematica but not sure if he has > 16 GB ram.

tthsqe12 commented 6 years ago

It doesn't need to be powerful - it just needs to be 64bit and have access to at least 16GB ram (without swapping to disk).

lantonov commented 6 years ago

Can this test be made with a rented server and Wolfram Alpha ? At the moment, i have an access to 5 720 649 720 bytes of ram and network access to mathematica 11.2.0

tthsqe12 commented 6 years ago

I don't think that is going to work. The thing is: I suspect that the kernel memory manager is dirty; we need to test it with a real kernel, and we have to have precise control over what commands are being executed.

EDIT: if you have <= 16GB ram, paging is going to be used, and you will have to wait about a decade for the operation to complete.

tthsqe12 commented 6 years ago

Why don't I just post the code in question. I would be very gratified in a resolution and vindicated in a confirmation. However, I don't expect anyone to be patient enough to run it with less than 32GB ram installed.

I have a suspicion that MMA is using 32bit reference counts. Could you please run the following pieces of code and tell me the output?

The first is just a trivial sanity check and should not return a large number in the third position. If the sanity check passes, it should be {1000000, 11000000,0} plus or minus 1000, say, in each position.

If the second piece of code returns a large number in the third position, then it has been confirmed and there is no need to run the third.

It is VERY important the lines with the definitions "s = ..." and "x = ..." have the "blah;" at the end and that this "blah;" is on the same line. (Basically, just copy the code exactly as it is, and run it on a freshly started kernel.)

Also, note that the kernel will be completely unresponsive during the construction of the constant array, i.e. this operation is non-abortable. If you have the slightest hint that your system does not have 16GB free for the second test or 32GB free for the third test, then do not run these pieces if you do not want to lock up your system due to paging.

First test - sanity check.

Clear["Global`*"];
slen = 1000000;
xlen = 10000000;
m0 = MemoryInUse[];
s = ConstantArray["hello!!!",slen]; blah;
m1 = MemoryInUse[];
x = ConstantArray[s,xlen]; blah;
m2 = MemoryInUse[];
Clear[x];
Clear[s];
m3 = MemoryInUse[];
Round[({m1,m2,m3}-m0)/($SystemWordLength/8)]

Second test - signed 32 bit.

Clear["Global`*"];
slen = 1000000;
xlen = 2^31 + 10;
m0 = MemoryInUse[];
s = ConstantArray["hello!!!",slen]; blah;
m1 = MemoryInUse[];
x = ConstantArray[s,xlen]; blah;
m2 = MemoryInUse[];
Clear[x];
Clear[s];
m3 = MemoryInUse[];
Round[({m1,m2,m3}-m0)/($SystemWordLength/8)]

Third test - unsigned 32 bit. There is no need to run this if the third number from the second test was large.

Clear["Global`*"];
slen = 1000000;
xlen = 2^32;
m0 = MemoryInUse[];
s = ConstantArray["hello!!!",slen]; blah;
m1 = MemoryInUse[];
x = ConstantArray[s,xlen]; blah;
m2 = MemoryInUse[];
Clear[x];
Clear[s];
m3 = MemoryInUse[];
Round[({m1,m2,m3}-m0)/($SystemWordLength/8)]
lantonov commented 6 years ago

First test: {1 000 462, 11 000 869, 1 335} Second test: {1 000 201, 1 010 317, 10 564} Third test: {1 000 201, 1 000 578, 825}

tthsqe12 commented 6 years ago

Yes, that is expected. With your physical ram count I wouldn't expect you to be patient enough to run the second. It will be paging constantly.

tthsqe12 commented 6 years ago

No - something went wrong in your second and third. If x were successfully constructed, then m2-m0 should be at least 16 billion.

lantonov commented 6 years ago

I will repeat second and third after restarting

lantonov commented 6 years ago

Second: {1 000 462, 1 012 887, 13 361} Restarting ... Third: {999 961, 1 005 129, 5 376} However, there is a message saying "This computation has exceeded the memory limit of your system" and $Aborted

tthsqe12 commented 6 years ago

If you getting only a small increase from m1 to m2, then something funny is happening with the second constant array. Try the simple command

x = ConstantArray["hello",2^31]; Length[x]

(I take no responsibility for unresponsive system - test at your own risk)

lantonov commented 6 years ago

The answer to the last command is: "This computation has exceeded the memory limit of your system" $Aborted

lantonov commented 6 years ago

Second try of the first test: {1 000 201, 11 000 394, 631} Clean (without any messages)

tthsqe12 commented 6 years ago

That interesting - so the construction of the huge array is failing outright (and not using paging). The difference is that when it is embedding in a sequence of command, MMA doesn't print the error message and simply errs silently. This is another one of the bone-headed design decisions of WRI.

Alright, so this means that your system cannot even attempt this. Mine was able to, but it started paging and I lost patience.

I should remark the the issue I want to test is not with the construction of x, but rather with the clearing of x and then s. It is an unfortunate prerequisite that a huge array needs to be constructed.

tthsqe12 commented 6 years ago

So if you know anyone who can test this with sufficient ram, I would be grateful for those first two sets of three numbers.

asmfishcc commented 6 years ago

Stockfish 9 released, when ASMfish? Thanks!!

lantonov commented 6 years ago

@asmfishcc Sorry for the delay. tthsqe12 is the heavy lifter here and it appears that he is busy with other projects.

lantonov commented 6 years ago

@tthsqe12 The string is 8 bytes which makes 34 359 738 368 bytes ram required for storing the vector. With the memory available to me on the system (6 400 827 392 bytes), I could reach until 2^25 = 33 554 432 bytes. 2^26 is already too much and calculation aborts. With 2^25 the numbers are: {1 000 201, 34 554 826, 631}

lantonov commented 6 years ago

Maybe these commands may be of use if you want to measure memory available before and after computation: Clear["Global`*"]; before = MemoryAvailable[]; x = Range[10^8]; after = MemoryAvailable[]; before - after ByteCount[x] The difference between memory before and after computation should be roughly equal to the structure created (in this example: 800 000 000). On my system, it is exceeded with 144 bytes. If you want to see only the memory available for the kernel, you can replace MemoryAvailable[] with MemoryInUse[].

tthsqe12 commented 6 years ago

The string is 8 bytes which makes 34 359 738 368 bytes ram required for storing the vector.

This is not how ConstantArray works or how expressions are stored in general. It doesn't matter how long the string "hello!!!" is, what will be stored in s in an array of pointer to that string, so the storage requirement is always 8*slen + length of string and not slen*length of string

you can replace MemoryAvailable[] with MemoryInUse[].

For the documentation, it looks like MemoryInUse[]+MemoryAvailable[] should basically be constant, so for measuring memory usage, they are virtual identical. From the timings of MemoryInUse[], it is fast enough that it cant be anything more that a simple running total of sizeof mallocs - sizeof frees, which is exactly what I want.

Again, the point is not to test what size array you can allocate on your system, but something different entirely. I have carefully constructed the code to test exactly what I want. We simply need a machine that can allocate and fill in at least 2^31 pointers.