mcostalba / scoutfish

Chess Query Engine
GNU General Public License v3.0
156 stars 23 forks source link

Query API #1

Closed mcostalba closed 7 years ago

mcostalba commented 7 years ago

Query database should be flexible, powerful and at the same time very general, it means that with the same form template we would like to cover many different query scenarios.

One approach is that of partial fen string. For instance if we want to retrieve all the positions with a white rook in a1 and a black bishop in c3 we can build up following query:

{ 
   sub-fen: [
                   "R7/8/2b6/8/8/8/8/8"
                  ]
}

In case we want also to retrieve the case where in a1 there is a queen, then:

{ 
   sub-fen: [
                   "R7/8/2b6/8/8/8/8/8",
                   "Q7/8/2b6/8/8/8/8/8"
                  ]
}

This kind of composition is simple but covers a lot of cases, we can do more, suppose we want to retrieve positions with passed white pawns in a5, then

{ 
   sub-fen: [
                   "8/8/8/8/P7/8/8/8"
                  ],
    not-fen: [
                   "8/8/8/8/8/P7/8/8",
                   "8/8/8/8/8/8/P7/8",
                   "8/8/8/8/8/1P6/8/8",
                   "8/8/8/8/8/8/1P6/8",
                  ]
}

This is just an illustrative example to present the idea of a query based on a list of very simple conditions that we can use to build arbitrary complex queries.

sshivaji commented 7 years ago

Sub Fen is a great idea, we can then get pawn structures in easily.

How about the material imbalances, such as:

  1. 2 Bishops vs Bishop and Knight imbalance
  2. Rook and 2 pawns vs rook. (note that in chess informant on endgames, this means that only rook and 2 pawns vs rook exist on the board along with kings, no other pieces). Can this be a special case of sub-fen?

I wonder if imbalance can be supported as part of a query as well.

like { material: ["B", "B"], other_material: ["B", "N"] }

This means one side has Bishop and Bishop and the other side has "B" and "N" instead of "B" and "B". Random thought..

mcostalba commented 7 years ago

@sshivaji good idea, we can use material tag, perhaps in this way:

{ material: ["KBBkbn"] }

In case we want to add also cases with 2 black knight, we can use the same composition of fen:

{ material: ["KBBkbn", "KBBknn"] }

mcostalba commented 7 years ago

Of course query supports composition of conditions like:

{ fen: [ "K7/8/2n6/8/8/8/8/8", "B7/8/2n6/8/8/8/8/8" ],

material: ["KBBkbn", "KBBknn"],

stm: "WHITE" }

In this case matching position should satisfy both fen and material conditions, we can list as many tags we want with the meaning of a logical AND.

sshivaji commented 7 years ago

The above will cover fen and material. Material difference might be interesting to incorporate, like score: "-1p" or maybe even score: "-100cp"

Lower priority but interesting perhaps: The other popular one is piece path. To be honest, most people don't search for a piece path. They query on a position, do an opening report, and get piece path suggestions from games. E.g. in a position, its common to do Bf1-d3-c2. This basically involves searching a sequence of fens as per your example, and seeing if a path was followed.

In the Chessbase manual, you can see theme keys for reference (there is not much info) at http://shop.chessbase.com/download/Programme/ChessBase12/ChessBase12Manual.pdf

@mcostalba Out of curiosity, do you have a copy of Chessbase? You can look at the existing theme search for reference.

sshivaji commented 7 years ago

Also, its worth adding that existing tools including commercial ones are very slow and go game by game. Fen and material (perhaps with material difference) would be a good start in my opinion.

mcostalba commented 7 years ago

I think going game by game is the only option to support general and complex queries.

Nevertheless I think this can be done very fast.

On Monday, December 5, 2016, Shivkumar Shivaji notifications@github.com wrote:

Also, its worth adding that existing tools including commercial ones are very slow and go game by game. Fen and material (perhaps with material difference) would be a good start in my opinion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mcostalba/scoutfish/issues/1#issuecomment-264955494, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDGATiEOv7QNLFXZ278Sa67h6pMJZnxks5rFGlSgaJpZM4LEgmV .

sshivaji commented 7 years ago

I see on game by game. Existing solutions are quite slow. Is your plan to load the database into RAM and do a multi-threaded RAM search? Just curious..

mcostalba commented 7 years ago

2 millions games are about 90 million moves, Stockfish can process all them in under a second, even in single thread.

With multithreads we can process hundreds of millions of moves per second...

On Monday, December 5, 2016, Shivkumar Shivaji notifications@github.com wrote:

I see on game by game. Existing solutions are quite slow. Is your plan to load the database into RAM and do a multi-threaded RAM search? Just curious..

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mcostalba/scoutfish/issues/1#issuecomment-264965767, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDGAXNMrMINLVAqll48BAw1QEOXx7vTks5rFHJigaJpZM4LEgmV .

sshivaji commented 7 years ago

wow, thats crazy fast! Quite unbelievable.

sshivaji commented 7 years ago

Thought on the query language to support:

  1. I think the query implementation will be very fast based on your benchmark above.

  2. SQL makes a lot of sense for universal understanding. Most people know SQL whether we like it or not :)

Making your query SQL compatible seems very easy:

{ fen: [ "K7/8/2n6/8/8/8/8/8", "B7/8/2n6/8/8/8/8/8" ],

material: ["KBBkbn", "KBBknn"],

stm: "WHITE" }

select game from database where fen in ("K7/8/2n6/8/8/8/8/8", "B7/8/2n6/8/8/8/8/8") and material in ("KBBkbn", "KBBknn") and stm = "white";

  1. The other advantage of SQL is that you can do joins to join combine complex searches if needed. I looked at JSONIQ and a few other query languages. I feel they are not ubiquitous and we can't justify their choice for a long-term solution.
mcostalba commented 7 years ago

We are moving to JSON now, so closing this.