Closed Jonathan003 closed 2 years ago
The second command removed all the duplicates over 80 ply.
You can create an output file by using -r to check to see what was removed.
For example: ocgdb -db human.db3 -cpu 4 -plycount 80 -dup -o printall;remove -r removed.pgn
The pgn is readable with Banksia and Chessbase.
You may remove shorter games when creating the new database. The first command should be:
ocgdb -pgn human.pgn -db human.db3 -cpu 4 -o moves2;discardsites -plycount 80
Thanks! Out of curiosity why did you ad an option to discardsites? I know it sometimes help if you remove the site tag in a huge pgn before importing the pgn in a SCID file. Otherwise Scid fails to import the pgn and you get error messages like "to many player names"
Some games from some sources (such as Lichess) may use URI for sites. It means each game may have a unique string for its Site tag. 100 million games will require 100 million records for the Site table. That may make our app (as well as any other chess databases app) slow down significantly when the benefit of storing such information is almost zero.
Discard Site for Lichess games (or any similar ones) will solve that problem. You may keep that field for games from other sources or when the number of games is not too big (say, under 1M).
BTW, in case users don't discard the Site, OCGDB will auto-detect strings of the Site from Lichess, if yes, it will save them into a new field named "Source" in the Games table. Just to avoid writing that informal too many times into the Site table (too slow).
More details here:
http://talkchess.com/forum3/viewtopic.php?f=7&t=78464&start=240#p919062
Thanks for the information.
I did a trial to test the option to remove shorter games than given ply I typed these comments one after the other
ocgdb -pgn human.pgn -db human.db3 -cpu 4 -o moves2;discardsites
ocgdb -db human.db3 -cpu 4 -plycount 80 -dup -o printall;remove
ocgdb -pgn human_out.pgn -db human.db3 -cpu 4 -export
Al games shorter than 80 ply are stil in the output human_out.pgn I probably do some mistyping in the second command.