Fishtest maintenance - Githubissues

mcostalba commented 7 years ago

Fishtest development is practically stalled since 2 years.

We need someone well versed in phyton and with a good amount of time and energy to dedicate to it. I really don't have time enough and I can just keep SF maintenance, both Gary and Joona are very busy in their day job. OTH fishtest needs improvements and needs a dedicated maintainer.

Stefano80 commented 7 years ago

I agree and I would even volunteer. But I think my knowledge of database and server maintenance is just not enough (although I think I could help with stats and optimization).

vondele commented 7 years ago

I agree that some modernization is needed, and it would benefit from more active maintainership. Presumably by adding a few more active maintainers to the official fishtest repo.

@Stefano80, From my point of view, it is not needed to have a single person with all skills of Gary, a small team would be good as well, and reduce the bus-factor of fishtest. @ianfab and @ppigazzini have been doing quite some work on fishtest...

scchess commented 7 years ago

It's unnecessary for a single person to take over the job. A few Python developers can do that. I'm happy to contribute if we have a team.

Stefano80 commented 7 years ago

Yes, if we could convince @ppigazzini to step up, I would gladly join more active development efforts. I was always somewhat discouraged from the inactivity of fishtest. (Without any criticism to Gary, let me be very explicit about that, everybody has a lot to do here...)

ppigazzini commented 7 years ago

I have some spare time only at weekend, and I'm not a developer (I learnt some very basic python only to contribute some patches to fishtest). As many of you I think that looking for a @glinscott clone (developer, server administrator, github repo maintainer, chess developer, etc.) could be unsuccessful, perhaps it's better to try to build a team with all those skills.

I suggest this steps:

start writing a list of new fishtest features & milestones
build the team then:
fork fishtest to a new github repo
develop and push the new features
put online a test fishtest framework
test the new release of fishtest (server and worker) in the test fishtest framework
push the new features to the official fishtest server. The fishtest server atm uses some old python packages (e.g. pyramid) and there are some HW constrains (RAM, disk space, CPU)
test the new features with some controlled workers
push the new features to all workers

ps: I think that atm @ianfab is the person with the bigger knowledge of fishtest and fishtest administration (just below Gary) , but I think also that he is very busy with multivariant stockfish/fishtest (he has contributed well over the 90% of the patches)

Stefano80 commented 7 years ago

Hi @ppigazzini , in principle I agree, I would reorder your plan as follows

build the team (without it we go nowhere)
put online a test fishtest framework
decide again what to do and keep the ball rolling (I think how we test heavily depends on which kind of resources we will have for the test framework)

ianfab commented 7 years ago

I will try to continue contributing to fishtest development, but I do not have much knowledge about server and database maintenance/administration (although at least being able to keep multi-variant fishtest somehow running most of the time) and I am also quite busy for the reasons @ppigazzini mentioned. So I could contribute to the code (even if it might be mostly minor improvements as in my open PRs), but not really to the server administration.

ppigazzini commented 7 years ago

@Stefano80

a team (or a sort of) seems already here
a development fishtest server with a basic hardening is on line in half a hour, a cloud instance with CPU Broadwell 4 core + HT @ 2.4 GHz, 50 GB hard disk, 8 GB RAM cost as one coffe/day (RAM max 64 GB)
so here we are at the list of features and milestones :)

@mcostalba you never posted in the threads about the stalled fishtest development, so I'm curious to know the new features that you have in mind.

scchess commented 7 years ago

We need a team leader. Someone who gives general direction, otherwise we can't work as a team!

@mcostalba Do you want to be a team leader?

Stefano80 commented 7 years ago

Hi @ppigazzini, I think we need to work on 3 fields

General stability: We don't have a acceptance test environment for fishtest and we actually need one without which fishtest will remain inherently fragile.

Endgame testing: we need a concept to test TB patches and we need a endgames dataset for testing endgame patches.

Tuning: we need a concept to evaluate tuning strategies, and we need to put some hard work on improving SPSA, input from @ilvec would be probably useful.

What are your thoughts?

ppigazzini commented 7 years ago

@Stefano80 I leave at developers the task to suggest new fishtest features chess related.

Some short term goals:

some clean up: naming convention (function and variables), function refactoring, remove dead code (windows builder), remove hard coded defaults (Linux, x86-64-modern), exception handling, etc. Also some fishtest components are stored in a different repo (opening books & cutechess-cli). cutechess-cli is very old and not built static (on Linux QT is a setup prerequisite, the old cutechess-cli does not close cleanly with MSYS2 python)
some new features: system info collection, compile options CPU based, TC scaled with a nps averaged on several benches etc.
comments and documentation to attract other developers

Some medium term goals:

pgn collection, speedup test, auto filling signature (using travis or appveyor to compute the signature), other chess related features
optional (here I'm biased): MinGW-w64 & python built by MSYS2 as default for Windows

Some long term goals:

users & workers management
tests archive: search, statistics etc.
python 3 ?
docker ?

Regarding the acceptance test, I think that atm it suffix to follow this workflow:

test locally with a development server and some workers (windows, linux), using some VMs. I use lxc/lxd to test some features with several distros (ubuntu, centos, arch etc.)
if necessary test with some cloud development servers and to have a bigger number of workers
the worker patches are more difficult to test thoroughly because there are several combinations of HW/OS/SW (e.g. I haven't a mac)

mcostalba commented 7 years ago

What about this one?

All tests in one page (with infinite scrolling btw)
Led (green/yellow/gray) on the left to show started, pending or finished state
New graphic and arrangement of the columns
Black border of the score box to show LTC tests
Link rendering in the test description column
Fancy date format
Number of active machines for each running test (see the number under the green led)
Number of crashes and time losses for each test (see cX tY in score box)
Sign in via GitHub: no more paswords saved on site, once logged in automatic detection of your SF repo

It is done in go language (actually it is a way for me to learn 'go', because I am totally new at it). It reads from the same MongoDB of official fishtest, fetching data from there (through a private VPN), so it is in strictly read-only mode to avoid any issues for the official fishtest site.

Stefano80 commented 7 years ago

I like it better than current. What does it cX tY after the score?

mcostalba commented 7 years ago

@Stefano80 crashes and time losses

Yery commented 7 years ago

For what it is worth: This new layout looks great! Strange there is so little enthusiasm.

Stefano80 commented 7 years ago

I find it great too! Should have said with more emphasis!

xoto10 commented 7 years ago

Looks great!

I have one question, is it possible to include part of the commit ref in the test name? If people (I am guilty of this!) submit multiple tests with altered code on the same branch, we need to click on the test to remember which code version it is. A couple of letters from the commit ref giving test_name_xy instead of test_name might be a bit easier to follow.

On 24 Oct 2017 7:52 pm, "Stefano Cardanobile" notifications@github.com wrote:

I find it great too! Should have said with more emphasis!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/official-stockfish/Stockfish/issues/1267#issuecomment-339094031, or mute the thread https://github.com/notifications/unsubscribe-auth/AWZGfImzGMhpVwVyJCDLhvj9YZMRNNIDks5svjIDgaJpZM4PxvED .

mcostalba commented 7 years ago

@xoto10 this is a good idea. I will do. Thanks.

mcostalba commented 7 years ago

Here e go:

I have replace the 3 ellipsis with the first 4 digits of the sha of new commit.

mcostalba commented 7 years ago

I have added the machines page (open clicking on the number under the green leds):

Modal view with machines grouped by collapsing elements for each active test
View is rendered fully on client-side: immediate and withouth requests to the server
Visual and animated summary for assorted stats
Same information further detailed for each task
Show idle workers (more than 5 minutes) with proper idle color (light gray)

joergoster commented 7 years ago

Looks great!

xoto10 commented 7 years ago

I had another thought. Is it possible to display the wins and losses for each colour, or is this information not available from cutechess?

On 22 Oct 2017 19:08, "Marco Costalba" notifications@github.com wrote:

What about this one?

[image: image] https://user-images.githubusercontent.com/1099265/31864668-9ce6daae-b761-11e7-9261-2011607f406a.png

All tests in one page (with infinite scrolling btw)

Led (green/yellow/gray) on the left to show started, pending or finished state

New graphic and arrangement of the columns

Black border of the score box to show LTC tests

Link rendering in the test description column

Fancy date format

Number of active machines for each running test (see the number under the green led)

It is done in go language https://golang.org/ (actually it is a way for me to learn 'go', because I am totally new at it). It reads from the same MongoDB of official fishtest, fetching data from there (through a private VPN), so it is in strictly read-only mode to avoid any issues for the official fishtest site.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/official-stockfish/Stockfish/issues/1267#issuecomment-338497147, or mute the thread https://github.com/notifications/unsubscribe-auth/AWZGfDuV0P84paRJFLjN9uegTiK95qMnks5su4SagaJpZM4PxvED .

mcostalba commented 7 years ago

@xoto10 no, I don't think it is available in fishtest (I am not sure for cutechess).

crossbr commented 6 years ago

Marco, I think it would helpful to have a real-time graph for each test, the x-axis being number of games played, and the y-axis being the current LLR value, along with the upper and lower bounds for passing and failing. That way you could see a test's history, and also see visually how close it was and is to passing or failing. -Bryan

vondele commented 6 years ago

@crossbr I have implemented a python script to do that, see https://groups.google.com/d/msg/fishcooking/0QTFBQJcuas/WQro-FSTAQAJ or https://github.com/vondele/FishTestWatch

mcostalba commented 6 years ago

@crossbr yes, this is a good idea but it requires an update to the DB (or using an external DB)

@vondele nice! Indeed in the very few free time, I implemented live update functionality. This can't be shown with a screenshot but it means that main page and machine page are not static but change lively while you are watching :-) reflecting changes in the underlying data. I have used a websocket to keep a connection alive between server and browser and push updates to the browser view: learning a lot of new stuff in the process...

xoto10 commented 6 years ago

@mcostalba Would dates be better displayed in YYYY-MM-DD format? Could you display number of active cores instead of active machines?

Fancy date format

Number of active machines for each running test (see the number under the green led)

xoto10 commented 6 years ago

My suggestion is we get one or two of the smallest things fixed asap. e.g. how do we get the timeout changed back from 30 minutes to 5 minutes?

Stefano80 commented 6 years ago

Btw, this thread was about stalled fishtest. Are we going anywhere with that? I thought there was some kind of agreement that @ppigazzini (and I) would be interested and available to do some work. @mcostalba: thoughts from your side?

IIvec commented 6 years ago

I now see that I was mentioned here. I think that fishtest should be open for 2 different tuners and then the worse one could in the future be replaced with new attempt.

Also, some strong opening books could reduce statistical errors significantly. I developed 2moves_strong book and I'm doing statistical analysis for it at the moment.

glinscott commented 6 years ago

This is great @mcostalba!

Of course, I'd be more than happy to hand over the keys to fishtest. I'm happy to keep the server up, but I just don't have time to update the code these days. I don't want to make changes because it's so damn stable now :). But if someone else is willing to take it on, that would be great!

glinscott commented 6 years ago

@Stefano80 and @ppigazzini I've invited you as collaborators on the fishtest repo. Give me an email at glinscott@gmail.com, and I'll give you credentials to the server.

ppigazzini commented 6 years ago

@glinscott : done @mcostalba @Stefano80 : we need a list of features and milestones

Stefano80 commented 6 years ago

Hi @ppigazzini , @mcostalba : I think the first milestone is to decide what to do with the several PRs open on the repository.

CoffeeOne commented 6 years ago

Sorry to say, but ... Not only fishtest development is stalled, also fishtest maintenance is stalled, which is more critical. The workers get Timeout: HTTPConnectionPool(host='tests.stockfishchess.org', port=80): Request timed out. (timeout=5.0) more and more often

ppigazzini commented 6 years ago

Peter, the server is running fine:

CPU load is 1.3% (the big requirement for CPU was the windows builder load, not used anymore)
RAM used 25%
upload and download of a 1 GB file are limited by my link limits (up 4.8 MiB/s, down 8.4 MiB/s)

The problem should be the type of network traffic from the spsa test, but this could be at ISP level.

CoffeeOne commented 6 years ago

@ppigazzini At the moment only 15 workers are working and there is still something to do, so something IS defenitely wrong.

IIvec commented 6 years ago

Hi all,

with each new refresh I see different tests running. Sometimes, I see all of them, but sometimes only some of them. This alert is important.

joergoster commented 6 years ago

@IIvec Same here.

ppigazzini commented 6 years ago

@IIvec try to stop the "tune_nmb" to view if this solve the problem. I have some free time only in the week end and I prefer to not touch anything on the production server before I figured out the whole configuration. My plan is to write a new fishtest server installer for CentOS and to have some servers as test/backup.

IIvec commented 6 years ago

@ppigazzini : OK, stopped, it seems that I anyway have enough data from that test.

ppigazzini commented 6 years ago

@IIvec thank you. BTW I used "stop" but I intended "suspend", Next time I will chose the word more carefully.

joergoster commented 6 years ago

Maybe these problems are a general problem atm. I also experience problems reaching other sites or streaming via Fire-TV ...

ppigazzini commented 6 years ago

@joergoster this is already known problem for tuning sessions with many parameters.

IIvec commented 6 years ago

@ppigazzini : I guess the problem was the number of games (500000) and not the number of parameters (33).

mcostalba commented 6 years ago

@ppigazzini thank you for picking up this!

I don't have milestones to give you, you are mainly free to develop in areas where you see possible improvements. Anyhow the critical job, apart from development, is maintenance and in particular fixing bugs and addressing issues (luckily very few, although code base is mostly stalled since years).

In case you are looking for ideas, I'd suggest to post a specific topic on the forum, although, beware, you will receive many feature requests, not all valuable and you have to be prepared to filter them out and to explain why you consider them not acceptable: this is, by far, the most difficult part of maintainer job :-)

ppigazzini commented 6 years ago

@mcostalba : OK, message received. ATM @Stefano80 is reviewing the PRs on GitHub (I'm still not so confident w/ git), I'm in charge for the server administration tasks.

vondele commented 6 years ago

@ppigazzini not sure if this is a server admin related thing... i assume there is more details in some log somewhere...

I'm seeing (since a long time) the following error message on (first) login to fishtest, e.g. to modify the state of a test:

Internal Server Error

The server encountered an unexpected internal server error

(generated by waitress)

somehow I'm nevertheless logged in afterwards, so I can with a few extra clicks ignore it, but would be nice to get rid of it.

ianfab commented 6 years ago

In case someone wants to pick them up, here are a few ideas (some of which might have been mentioned before in this thread, I have not checked) I would like to implement or at least think through further but did not have the time yet:

add support for speed measurements, i.e., distributed bench testing
add an option to disable auto-purge per test
show bad tasks in a table below the tasks of a test, but still do not include them in the stats of the test. So far stats of bad tasks are deleted, so this would only work for new tests.
tags on tests (added by test author, editable by maintainers, possibly only selectable from a predefined set) to enable better filtering of the test list
remove or fix the test type "regression"
make internal priority depend on LLR (for ideas on the formulae, see https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/lH9I0Kajb7I)
add a field to input an EPD (with limited size) for a test
- to support custom books (e.g., endgame positions or the like)
- and/or add support for test suite testing (maybe using python-chess)
in addition to the score from the first engine's point of view, report the score from white's point of view (only on the test details page)
probably there could be more minor usability improvements similar to some of my previous PRs, e.g., regarding test submission, displaying and browsing of tests list and test details pages, etc. Here feedback from users might be helpful.

I would also like to get rid of the error message @vondele mentioned which I also often encounter, but I have not investigated yet where this is coming from.

Mindbreaker1 commented 6 years ago

Honestly, I like it a lot the way it is, with a few minor changes: draw percentages, and a next/previous page button at the bottom. And I would like to see more opening book options, particularly various endgames. Maybe 30 books each focused on one kind of ending. Then we can improve our endings and judge endings from afar more accurately. Though an automatic 30 second refresh mode would be nice for following progress. And though completely eye candy, it would be nice to see a graph of how a patch did during testing...its ups and downs...how close it came to passing... Maybe distinguish more clearly between patches testing for Elo and those just trying to avoid regressing. Maybe a darker shade of green, red, yellow for the attempts at Elo gain vs the others.

I did make a bunch of ending books, but they haven't been checked for large advantages and such. Some I haven't even run. And they have repeated positions because I just used our 2-move book and removed pieces (the duplicate removers I tried required that there be games, not just positions). And there are 5 or 6 that someone else made so I could do some testing a couple years ago. There could be all kinds of problems with the books I made, I don't know. But maybe it is a starting point.

I have removed the attachment so as not to confuse anyone. The new endgame books are further down this thread.

official-stockfish / Stockfish

Fishtest maintenance #1267