Closed slarson closed 11 years ago
The discussion went on for a bit after you left Stephen. Balasz is going to take a stab at two sections, the first one which focuses on the theoretical Turing test and the second one focused on OpenWorm where we introduce some constraints over the test for the first phase. I'm assuming this issue relates to the first part.
Turing originally focused on one feature of humans -what he thought was most peculiar- : intelligence. He didn't design a test to check if an AI could walk or eat. And as consequence of what he wanted to test he picked one way of interacting with the AI: blind chatting.
In order to decide which features the test has and what interactions will be allowed we need to decide in the same way as Turing did what are the features we want to test. Since we are surely not going to talk to the worm the only way we can give him some input -without opening the box- is by interacting with the environment in which the worm is. Putting some food for instance will be our "asking a question" looking at the worm going towards it will be our "reading the answer". It will be a behavioural test, not an intelligence test (one reason why it might cause confusion to call it Turing test btw). I think "specific metrics", "glass box" and "specific test" all belong to different categories, most notably local entity testing. Tests which require knowledge of how the worm is implemented can be carried out only from the inside. The theoretical "Turing" test shouldn't assume there are cells inside the worm, shouldn't assume there are neurons or neuropeptides in the same way that Turing wasn't assuming it. Otherwise it's a different concept altogether.
If we define our test properly we should expect people trying to write code to pass the test, and it wouldn't have to be biologically inspired, people could try to hack something together that behaves like the c. elegans. And if they succeed they will have passed the test. The turing test wasn't about being biologically accurate. If we want to draw a parallel our test shouldn't be either. I think in the OpenWorm section we should add these considerations.
If we want we could add in the second section that OpenWorm is adding some specific metrics and some glass box testing on top of the turing test because our goal is not only to reproduce behaviour but also to do it in a biological plausible way.
To recap my take:
Worm Turing Test (Behavioural test: human creativity test, black box, general theory for test) OpenWorm Testing = Turing Test + Specific metrics test, glass box, specific tests.
I broadly agree with Matteo, I would like to add some perspective/additional points, these comments are more focused on reaching an internal consensus than the publication per-se.
Hello team,
First reacting to Stephen: -I am clearly on the side of human creativity test, however I think it is a very good idea to have a set of specific metrics that we test internally. I think if we go the specific test way then we need to rewrite the paper significantly. Also what are the exact metrics that you guys think would be a good idea? Doing something with the eigenworms is a good starting point, but many details must be filled in -I think glass box is just black box+something else. So we must do the BB anyway. At this point why bother with cellular level activity as well? I think we gonna have plenty of headaches with BB already... -I am a bit unsure of what do you mean by "general theory for tests vs very specific test" :( But if i understand you correctly I do not think we need to present any general theory for validating computational models, its (hopefully) going to be an perspective article, not a philosophy of science monograph
Matteo: -I think it is important that the model is based on biology. Although you are right that if it passes the test it passes, but the behaviour of the worm is not the point IMO. The point is to see if a model with a given amount of biological details is capable of replicating c. elegans behaviour or not. To me the interesting bit is to better understand that how much "high level" (behavioural) accuracy we gain as we add on more "low level" (biophysical) details. -I agree with your comments about "why it might cause confusion to call it Turing test btw", but I think we are better of to leave it this way. Bear in mind that this is paper is prostitution where we just want to gain some attention. "behavioural indistinguishably test" is not nearly as exciting...
Mike:
-"Matteo points out that Turing focused on testing only one feature of humans in which he was interested - and used the constraints of the test (limiting the user to blind chatting) in order to focus on testing the limits of the simulation. This should be our approach" -> I think this is our approach. Of course it would be good to open up the model to allow interactions later, but first we need to deal with the special - and simplest - case of zero interaction.
-I think you are right about the CGI animation, but again I think to focus on the behaviour of the worm is wrong - see my comments to Matteo. Honestly who gives a **\ about the behaviour of the worm? It only has 302 neurons and its the stupidest organism ever! What is interesting is to find out how much biology we need to bring about this boringness autonomously. That is why simplicity and predictive power is not important for our model IMO.
-As a first perturbation to the environment-body system we should alter the viscosity of the surface on which the virtual worm moves. This will directly confront the model with the swimming-crawling transition discussed somewhere in the paper!!! A huge extra that here is that the real experiments have already been done and we only need to do our virtual equivalent
I pulled apart the BTT section of the document. The first bit is now called "Criteria for success: a Turing-like test for an in silico C. elegans" and it is more of a introduction to the mindset and keeping everything general. The last paragraph of this section could use an extra few sentences, currently it ends a bit awkwardly. Within this section there is a new sub-section "OpenWorm and the behavioural Turing test", that is more specific to us. This one also has a slightly odd ending as well, I will try to work on it Thursday morning. If you guys want to make edits, please feel free to do so (however I would prefer if you create a copy of the existing paragraphs and modify it, rather then simply overwriting the current ones)
best, balazs
Hi Balazs,
I think we might be arguing over subtle definitions more than anything. You say " What is interesting is to find out how much biology we need to bring about this boringness autonomously. That is why simplicity and predictive power is not important for our model IMO." -> To me finding out how much biology we need to bring about this behaviour autonomously is all about simplicity (the minimum amount of biology) and predictive power (replicating the behaviour - predicting a verifiable result).
I think we should try and reach a consensus, I propose rewriting/adding/removing from the following six points until we have something everyone agrees with:
I agree with Mike who can represent me tomorrow since unfortunately I won't be able to be at the meeting.
Hey guys,
"...we need to bring about this behaviour autonomously is all about simplicity (the minimum amount of biology) and predictive power" -> agree, but iff biological is included. IMO that is not the part that should be in parentheses, but rather with capital letters. But maybe you are right, we are talking about the same thing with slightly different wordings. I think we are all on the same page regarding this.
Very good job with the points! I agree with all of them with one exception: I would not separate the behavioural and the null environment test. I think what you call the null environment test is just a special case of the more general BTT. Its is the simplest version where no interactivity is allowed (which is again just a special case of interactions). We can phrase the paper this way, but I think it would just add an unnecessary complication in the terminology.
Also in this paper I would emphasize the null environment test, not the more complicated versions (by emphasize I just mean the number of words we spend on it). It is simply because that is what we can aim for first. Once we pass that we can publish an other paper including the next BTT challenges that we want to take on.
balazs
Hi Balazs, perhaps you could produce your copy of the points with your own modifications/additions, we could do this in an iterative manner until we have a set of points everyone agrees on.
Regarding the question of separating the behavioural and null environment tests - I am still of the opinion that the Null Environment Test should be thought of as a non-Turing test, although I do see your point that it is a subset of the Behavioural Turing Test where the constraint on interaction is at a maximum. We must at least emphasise that it is a very limited form of BTT. I think at this rate we will soon reach a consensus on this.
Just to make my position clear: I'm against including biological constraints in the theoretical definition of the turing test (1st part). It is true that the test is trivial if there is no biology and there is no interactions, which is why I'm pushing to have the definition of the turing test to include interactions. I propose the following definiton:
Behavioural Turing Test (M) There are two identical screens both showing a C.elegans free to move in the same environment. The video on the screen is filtered so that no visual clues can hint which one of the two scenes is real. The tester can interact with both scenes through an identical interface. Using the interface it is possible for the tester to influence the environment in different ways agreed upfront according to individual test goals. A full spectrum of possibility is allowed, from a non interactive scenario to a fully interactive one (changes of terrain, light, food, mates, etc. allowed). If the tester cannot distinguish which one of the two screen is showing an in vivo c.elegans and which one is showing a simulation then the program that runs the simulation passes the Behavioural Turing Test. The more interactions will be allowed the stronger the test validity will be.
I completely agree with the above @tarelli definition for the BTT.
I quite like that definition from @tarelli too.
Updated consensus points (please continue to contribute).
hey,
good to see the active discussion, keep on the good work!
I think Matteo is right that the bio constraints should not be in the definition of BTT, however even without bio constraints I do not think it would be trivial to pass BTT in anything other than the null environment (even there I think it would be a greater challenge than you think)!
The only thing how my points would differ from Mike's is that I would not detach the null environment test from BTT, I like everything else. I do not see what is the fundamental difference between the non-interaction interaction and the some interaction interaction. I think it would be far better to keep everything within the same general conceptual framework - that is BTT -and not to make this IMO highly artificial distinction. I do not see why and how this would make the paper better. It would just make the terminology more complicated without adding anything to the content. It is true though that the current text should be clearer about the family of possible BTTs. Maybe this should be discussed first and then argue why to focus on the special case of null environment and non-interactivity first
I am confused what is your exact position on this, because later Mike wrote that he sees the generality of my perspective and Matteo's definition also includes non-interactivity for BTT. Would you happy to keep the null environment within BTT if we emphasize that is a special case within the family of BTTs? Iam heavily in favour of this as you might have guessed
l am happy with the definition of BTT by Matteo. However if we allow full interactivity (e.g. adding food or poking) than it would be difficult to make the two actions the same in the real and the virtual environment. For example adding food can happen instantly while in the real a word a hand (chemical trace!) must approach the set up and leave the patch of food. This may or may not make any difference - I do not think they will, but something to keep in mind. Also as far as the definition goes I would delete the last sentence. It is true, but should not be part of the definition (I would move it to the general discussion)
Also a related question is how long the test should be? Might seem a silly question but if it can be indefinitely long, then it is going to be easy to spot the real worm when it dies!:) This applies to both the null environment and what is after that
I just got home after a day of the most boring workshop ever and my brain is dead, but tomorrow before the meeting I will have some time to update the document.
balazs
Hi all,
Great discussion so far.
Here are my worries about the BTT as we have defined it:
Consider for a moment my original idea for how to practically test the model: a validation engine. The input is movies that shows worm behavior and cellular activity. The validation engine enables scripts to parse out the behavior and cell activity into representations that are directly comparable to that of the simulation. Because of this apples-to-apples comparison mechanism, the validation engine can calculate the difference between the worm in the movie and the simulation. Critically important, the validation engine can also measure the delta between different versions of the model and whether the score goes up or down becomes the metric for success for the model.
Advantages of this test over the current BTT:
I realize this is going against the grain of consensus right now, but I think it is important we consider this paper in the broader context of the success of the project in the medium and long term. Given a choice between being more faithful to Turing's original test and defining a test that is practical and achievable for the project's goals, I go with the latter.
Hey guys,
Stephen is right that we have to make the BTT practical as well, but I do not think it is going to be a major obstacle.
I think as far as the null environment BTT is concerned we are good, but it might be difficult to get the same interactivity in silico and real life - see my previous mail. I think maybe we should think in terms of predefined environments at t=0 with patches of chemicals and/or heat/chemical gradients, but no interactions during the test. This would solve some of the problems: no need to interact the same way in the virtual and real world while test is running. On the downside it does not allow interaction during the test, however multiple runs with different environmental configurations should still be allowed. It also has the advantage that the real and the virtual videos can be generated independently at different time and space coordinates and later given to interrogator(s). I am not saying it is best way to make BTT practical, but the first that comes to my mind.
To answer Stephen's second question: I believe the BTT is the strongest test for behaviour, because it does not rely on any quantification of behaviour. Behaviour is a process not a number/metric. So no matter what set of metrics you come up with you will always compare apples to oranges or at least you will create an orange from an apple and then compare it to a virtual orange created from a virtual apple... I hope it makes sense! BTT compares apples to apples and it does not force behaviour to be anything else other than what it is. I think attempting to reduce behaviour to a set of metrics is just fitting a square into a round hole.
However as I said earlier it would useful to think of such a metrics and test our model against these internally - this would be a good exercise to put ourselves into the mindset of the interrogator!
balazs
2013/1/17 Stephen Larson notifications@github.com
Hi all,
Great discussion so far.
Here are my worries about the BTT as we have defined it:
- How are we (meaning actually us) going to ever set up this BTT? I'm concerned we are going to announce to the world a test that is impractical for us to set up, thereby setting the project up for failure.
- We agree there is a wide range of tests outside this way of defining the BTT -- why is this the most important one?
Consider for a moment my original idea for how to practically test the model: a validation engine. The input is movies that shows worm behavior and cellular activity. The validation engine enables scripts to parse out the behavior and cell activity into representations that are directly comparable to that of the simulation. Because of this apples-to-apples comparison mechanism, the validation engine can calculate the difference between the worm in the movie and the simulation. Critically important, the validation engine can also measure the delta between different versions of the model and whether the score goes up or down becomes the metric for success for the model.
Advantages of this test over the current BTT:
- Provides a quantitative metric of success that can improve over time, rather than a binary pass / fail from the BTT that teaches you little if you fail about whether you are on the right track or not.
- Can be engineered based on video contributions coming from biology partner labs
- Having this metric is on the critical path to building the worm anyway -- we have to compare the output of the worm to real data anyway in order to optimize the model. Might as well also make this a criterion for success.
- No need to set up an experimental rig with a microscope to test the model
I realize this is going against the grain of consensus right now, but I think it is important we consider this paper in the broader context of the success of the project in the medium and long term. Given a choice between being more faithful to Turing's original test and defining a test that is practical and achievable for the project's goals, I go with the latter.
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12355031.
Having read the last two posts I would like to summarize the stage we are at:
I would like to offer this additional view:
I agree with Balazs that the BTT is the "strongest" test for behaviour, in the sense that it would be the most convincing to most people, including biologists.
I also agree with Stephen that the VET has distinct advantages, particularly in the development of models and as a consequence of its inherent feasibility.
I think we should adopt this attitude: * BTT is our gold-standard test, even if it is only "in principle" and never actually implemented. The VET is a more-achievable/realistic test, which we will rely on strongly*.
We musn't forget that having multiple tests is of course a good thing.
In addition, I have updated the consensus points to the point I believe we are at, I will continue updating these until we have something everyone agrees on.
@slarson was Turing setting himself up for failure when he devised the original Turing test?
I think the validation engine as you describe it is a very good tool for us to test the model in various ways but has nothing to do with the Turing test for the worm. It's just a set of computer-testable metrics one can choose to have in place or not in his quest for passing some weaker or stronger form of the BTT.
I'd very much like the 1st section of the paper to be dedicated to the generic BTT and the 2nd part to describe what weak version of the test we are aiming for and what tools we are going to use to assist us in this (possibly a battery of tests similar to the ones you describe).
Again - automated testing of the model in my eyes has nothing to do with the Turing test. Fooling a computer is far too easy and un-interesting (considering there's no constraint of biological realism) if you know which set of metrics it is testing for.
Hey,
Just to comment that I agree with the developing consensus, though I think there won't just be one VET, this type of machine executable test will be needed at many levels, individual channels, cells, muscles, systems and the whole worm. The behavioural VET is just the top of a hierarchy that will be used to tune the model at each level, and each VET will use data appropriate for that level. The BTT and the top level VET just happen to use similar behavioural test input & so is more appropriate for a subjective human assessment of passing the test, and so is comparable to the Turing test.
My two cents anyway. Apologies, but I can't attend the meeting today and will have minimal input to give to this until 1st Feb, I've to get a 5 year grant application in by then and that needs my attention...
Adding to Padraig's observation, the multiple VETs will be an excellent tool in the field of automated optimization of the model, using behaviour as a target.
I re-read the latest 9 points from Mike and I'm happy with 100% of it. Balazs, if you are not can you make your concerns clear by copying and pasting them and making edits to them?
Per our earlier discussion, there is nothing in the 9 points that we could not do by doing game programming. If all we want to present to the world is a video game, I think we will be laughed at in regards to Turing. Our advantage is the manipulation at the cellular level (probably through gene expression) which maybe why this discussion may be premature on our part. I think the only Turing test is to manipulate the biology of the worm and show that a manipulation in the wetware is the same result as the manipulation in the software, and visa versa. However so not to shoot my mouth off too far, I'll reread Balazs paper and see where I fit this in.
BTW: Minor point but from a Computer Science POV, using "Null" is a different connotation than what you are expressing. It's not a Null Environment (from a CS POV), it's a desensitised environment. I hate to be picky but I do cringe when I read this.
On Thu, Jan 17, 2013 at 4:17 PM, Stephen Larson notifications@github.comwrote:
I re-read the latest 9 points from Mike and I'm happy with 100% of it. Balazs, if you are not can you make your concerns clear by copying and pasting them and making edits to them?
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12399394.
I don't know quite what you mean by game programming? I assume you mean using heuristic rules rather than simulations closely related to cellular activity.
Taking out one neuron and observing the change in behaviour is an example of one form of interactivity which such heuristic rules couldn't generate, unless you had heuristic rules for each neuron, which is a cellular level simulation anyway.
I see no issue with a non cellular biology- guided situation which passes the proposed tests, in fact I think it whole have huge scientific value and certainly nobody would laugh. I do suspect however that it is an impossible task.
Mike: Yes, you could state that game prgramming is a set of Heuristic rules. I think if you played contemporary video games (e.g. Call of Duty), you could envision a very easy move to a worm simulation to being able to change the worm configuration such as removing a neuron. I certainly can, and having individual rules for 1000 cells would be very doable. And I agree that it could be useful at many levels for science. If we had a few hundred thousand dollars and could hire game programmers, we could have a very sophisticated simulation in a short timeframe. That's not what we are doing, we are going much deeper.
Personally, encouraging as it was to discuss initially, I think we are trying very hard to push our project into a Turing model rather than deal with the real science. I am a purist and look at the ultimate goal of what I envision and a "Turing test" is a byproduct of the science. Although I really enjoy the discussion, it seems we are going in circles which is the first sign that we aren't far enough along to have the definition we are looking for. Since this can only be a paper of conjecture, I think as written it suits that purpose. I don't want to add noise here so I'll get back to the science and let you guys hash out the theories :-)
Tim On Thu, Jan 17, 2013 at 4:55 PM, Mike Vella notifications@github.comwrote:
I don't know quite what you mean by game programming? I assume you mean using heuristic rules rather than simulations closely related to cellular activity.
Taking out one neuron and observing the change in behaviour is an example of one form of interactivity which such heuristic rules couldn't generate, unless you had heuristic rules for each neuron, which is a cellular level simulation anyway.
I see no issue with a non cellular biology- guided situation which passes the proposed tests, in fact I think it whole have huge scientific value and certainly nobody would laugh. I do suspect however that it is an impossible task.
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12401621.
Dear OpenWorm community. Here is an interesting nematode wiring paper, fresh off the web:
http://www.cell.com/abstract/S0092-8674(12)01500-0
Jay
On Mon, Jan 14, 2013 at 6:13 PM, Stephen Larson notifications@github.comwrote:
We need to come to some consensus on basic features of the Turing test paper. I have tried to spell it out here in the issue. I have marked with first initial where I think some of you fall, but please correct if this is not accurate.
- Human creativity test (B, G) versus set of specific metrics (P, SL)
- Human creativity test broadly means we don't set any specific tests other than providing access to the model. Humans can do whatever they want to test.
- Specific metrics means we list out the tests we want to see done.
- Black box (B, P, G) vs. glass box (G?, SL)
- Black box only considers behavior that is visible from outside the body
- Glass box also considers cellular activity
- General theory for tests (B, SL) or very specific test? (P)
- General theory for tests would propose a broader context into which tests are done
- Very specific test would avoid a broader context and only focus on a specific battery of tests, for which there could be others that won't be touched on at all by this paper.
G - @JohnIdol https://github.com/JohnIdol T - @Interintelhttps://github.com/InterintelA - @a-palyanov https://github.com/a-palyanov SK - @skhayrulinhttps://github.com/skhayrulinM - @vellamike https://github.com/vellamike P - @pgleesonhttps://github.com/pgleesonSL - @slarson https://github.com/slarson B - Balazs
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40.
Hey guys,
Today I wont be able to reflect to yesterday's meeting and what has been posted here since as I am at an all day workshop. However tomorrow it will have a high a priority.
balazs
2013/1/18 Jay Coggan notifications@github.com
Dear OpenWorm community. Here is an interesting nematode wiring paper, fresh off the web:
http://www.cell.com/abstract/S0092-8674(12)01500-0
Jay
On Mon, Jan 14, 2013 at 6:13 PM, Stephen Larson notifications@github.comwrote:
We need to come to some consensus on basic features of the Turing test paper. I have tried to spell it out here in the issue. I have marked with first initial where I think some of you fall, but please correct if this is not accurate.
- Human creativity test (B, G) versus set of specific metrics (P, SL)
- Human creativity test broadly means we don't set any specific tests other than providing access to the model. Humans can do whatever they want to test.
- Specific metrics means we list out the tests we want to see done.
- Black box (B, P, G) vs. glass box (G?, SL)
- Black box only considers behavior that is visible from outside the body
- Glass box also considers cellular activity
- General theory for tests (B, SL) or very specific test? (P)
- General theory for tests would propose a broader context into which tests are done
- Very specific test would avoid a broader context and only focus on a specific battery of tests, for which there could be others that won't be touched on at all by this paper.
G - @JohnIdol https://github.com/JohnIdol T - @Interintel< https://github.com/Interintel>A - @a-palyanov https://github.com/a-palyanov SK - @skhayrulin< https://github.com/skhayrulin>M - @vellamike https://github.com/vellamike P - @pgleeson< https://github.com/pgleeson>SL - @slarson https://github.com/slarson B - Balazs
— Reply to this email directly or view it on GitHub< https://github.com/openworm/OpenWorm/issues/40>.
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12408169.
@Interintel I still am not sure about the distinction being made between "real science" and games: A model can pass a test or it can fail a test. If there is an aspect of the model which is not covered, it would fail a more sophisticated test. The only thing which can be a measure of the "realness" of a model is its ability or otherwise to pass a predefined test, for two models which both pass a test, the simpler model can be considered superior with regards to passing that test. This test can include detailed questions about the underlying biology, but the end measure is still the ability to pass a test. I have designed a flowchart detailing model development under this rationale:
Mike:
To explain what I mean, I refer to "real science", in our case, as replicating the biological physiology in silica. Although we are all working on different aspects of the worm simulation, the grand result is the hope that we will pull these parts together over time and create a simulated C. Elegans. However, I think it is good to have an end vision of our work so we all have a unified goal as we get there step by step so let me present my end goal and as a disclaimer, this is my perception of a goal, not representative of the group or any individual.
As some history, as I expressed to the group a couple of years ago, I have a good friend (actually having lunch with him tomorrow) that is a Pathologist at Amgen, whereas Amgen is a world leader in biotechnology. He expressed to me when I was telling him about OpenWorm that the US Federal Drug Administration actually has a mandate to make drug companies (like Amgen) use more computer simulated testing and less animal testing. As I am sure most of us know, animal testing is still by far the means as to how drug companies test their products; i.e. they give the animal their drug over time, euthanize the animal and dissect it to see the results. They do this over and over again before going to human trials. Johnson and Johnson at one time expressed interest in what we were doing and I'm sure from this POV.
So with that, my ultimate goal is to have a worm defined through data that can in turn be read to create a simulated animal that can be manipulated in every way as the real animal. As I was expressing in my chat post on Thursday, to me the ultimate test of this simulated C Elegans is to perform mutagenesis on the simulated worm, perform mutagenesis on the real worm, in that a mutant strain that had not been previously been observed, and see that the simulated worm exhibits the exact same behavior as the real worm; i.e. we know the results we will get in the real worm before we do the actual mutagenesis. This will prove that our simulated worm is complete and worthy of science; i.e. a worm researcher could manipulate the in silca worm first to understand the effects of the experiment before doing live animal experiments with the secure thought that the results in silica will be the same as in vitro. Can you imagine how much this would accelerate research?
Everything we are doing today is contributing to this goal.
We have much to overcome and to develop. To mutate a worm, we use nature, chemicals, X-Rays, etc to change the chromosomes that result in the mutant (i.e. point mutations and chromosomal rearrangements) . So my ultimate goal is to allow researchers to change the DNA to create whatever mutants they desire. Obviously a lofty goal because this implies that we could read the DNA as our data base and create the simulated organism as the result, something that will earn us all Nobels. In the mean time, we are taking both a bottom up approach (why I am trying to pull together data we can find on gene expressions and metabolites) as well as a top down approach where we are looking at neuron to motor connections and the results. The SPH is a huge, huge part of this and to distinguish "game programming" from what we are doing, SPH is that essence. In game programming, you are using sprites which are animated components to display your results; i.e. a mini movie that gives you the illusion of movement and change = frames of images rapidly displayed in sequence. SPH is creating a cell using hundreds of elements that act and change according to the law of physics and how they are bound to one another. I am constantly amazed and rooting for our team in Siberia because they are the essence of our simulation in so many ways.
I'm not sure I understand your diagram. Why make the model simpler? Perhaps you are saying, more efficient and more refined? But I think (not to put words in your mouth) you are looking at the simulation from the gamer POV = if I can make a simulation look and act like a worm, then I have a valid simulation. This is, in many ways, a top down approach and I think this is a stage we must go through, if for no other reason, than to get some enjoyment out of our work. But my goal is purely a bottom up approach and although I have no problem with the direction we are going via a top down approach, I do not like when we impose this on the scientific community as anything other than what it is = a display of some biophysical properties. I think for us to be taken seriously by the scientific community, we need to explain first and foremost what biophysical properties are are representing and that we have no illusions that what we have done is an animation and not a true biological simulation. I think when we can give the scientific community the ability to change the DNA and create the mutated animal, we will get a lot of folks attention and we will have reached my goal.
Back to the Turing paper, as presented, I think it is OK. Again to show a real worm via a movie and have an "animated" worm that when you change a biophysical property, acts in the same way as the real worm, we are showing a step towards the Turing test and validation that we are on the right path. The Validation testing, as someone brought up, is unit testing and we need to define our unit testing absolutely, but this is not what Balazs is trying to do. He is attempting to do exactly what you are proposing in that if we can change a parameter in a simulation (i.e. blackbox) and if the resulting behavior is the same in the real worm, we are moving towards validation of the model. I think we are all struggling with the defintion of the Turing Test because to many of us it implies we are much further along than we are. If I were writing the paper, I would go for the ultimate test of comparing mutants in vitro to mutants in silica as expressed by chromosomal changes. I think this will grab the attention of the scientific community and give credibility to our research and where we want to go. Many will think we are daft as many do already but it would give us true credence that we are very serious and not trying to make a cgi movie.
Sorry to be so long,
Tim On Fri, Jan 18, 2013 at 6:11 AM, Mike Vella notifications@github.comwrote:
@Interintel https://github.com/Interintel I still am not sure about the distinction being made between "real science" and games: A model can pass a test or it can fail a test. If there is an aspect of the model which is not covered, it would fail a more sophisticated test. The only thing which can be a measure of the "realness" of a model is its ability or otherwise to pass a predefined test. This test can include detailed questions about the underlying biology, but the end measure is still the ability to pass a test. I have designed a flowchart detailing model development under this rationale: [image: UntitledDocument]https://f.cloud.github.com/assets/1540349/78282/df4af400-6178-11e2-91fa-33cfcb7f8262.png
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12423182.
Hi all -- I'm excited that a lot of energy has gone into this thread and thank you all for investing time in it. If you'd permit me to apply a little bit of moderation -- I'd like to see this topic converge back to the 9 points Mike has put forward. Please only reply with edits to the 9 points, as I'd like to see some version of them get into the manuscript verbatim. If you disagree with them entirely, please re propose alternative points -- let's just keep this structured.
I'm happy for other topics to be discussed though as I think we have stirred up a lot of thought-- let's just move the longer discourse back to openworm-discuss!
Thanks!
Hey guys,
I think it is very interesting what Tim wrote, we should talk about it on Monday, but now I will focus only on Mike's 9 points. I agree with all of them. I think where I differ from Mike is that what we would include different things with different weights in the paper. From my mental notes I think there are 2 key differences so far:
I think to make a model that has a realistic chance to pass any interactive BTT is far into the future. Consider this: one of the first form of interaction that we talked about is introducing food. This does not seem a complicated test, however our model will have zero chance of passing such a BTT. I reference a paper in the document that is about how the presence of food alters the speed of locomotion through some dopamine pathways. Since our model will not include a single neuropeptide for the interrogator to distinguish our model and the real worm all he would have to do is to look for this effect during the BTT. Although we might be able to find a form of interaction where neuropeptides do not play a role - mechanical stimuli is the only form that I can think of -, but generally we will need neuropeptide signalling to model any chemical sensing. And the worm mostly (or maybe even exclusively) uses chemical signals to decide where to go / what to do next. And this difficulty is not likely to be resolved soon. To be able to model the chemical regulatory system I think you would have to RNA seq all of the neurons to see which neurons have receptors to sense which of the 250 neuropeptides (single cell RNA seq is possible now, but it is very expensive). The genome of the worm is known, but that does not tell you which of the receptor encoding genes are up/downregulated for a particular neuron during development. After this you would have to understand how each of these peptides effects the physiology of cells. I am not sure how you would do it, but it would not be simple that is for sure.
I would like to emphasize that I am not against interactions, of course they would make the verification much stronger. However I think we should be realistic about the what the model is likely to be able to achieve and interactive BTTs are not yet on that list. Hence first we should focus on the simplest scenario and that is BTT-NE. For that at least we do not have an obvious reason why the model will fail.
-I would not write about VET in the article.
In this case my reasons are practical as well. First of all at the moment there is still no proposed VET. IMO it would be a mistake to rush to make up one to be able to include it in the article - it is unrealistic to think that it will not be time consuming to come up with a set of good metrics. Bear in mind that so far we have written is 2723 words not counting the "Future work" section which must be completely re written (the word count limit is 3000). The already existing sections could also use some extra words to make them clearer. I think it would be better to focus on a fewer concepts , rather scratching the surface of more. This will be a perspective article so it is not about giving all the details and we all agree that the principal test is going to be BTT (Mike No1) - hence we should focus on that.
Like with the interactions I am not against VET, I just would not include it in this particular paper.
Looking forward to the continue the discussion on Monday, Balazs
2013/1/18 Stephen Larson notifications@github.com
Hi all -- I'm excited that a lot of energy has gone into this thread and thank you all for investing time in it. If you'd permit me to apply a little bit of moderation -- I'd like to see this topic converge back to the 9 points Mike has put forward. Please only reply with edits to the 9 points, as I'd like to see some version of them get into the manuscript verbatim. If you disagree with them entirely, please re propose alternative points -- let's just keep this structured.
I'm happy for other topics to be discussed though as I think we have stirred up a lot of thought-- let's just move the longer discourse back to openworm-discuss!
Thanks!
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12437711.
OK -- to be clear, my interpretation of Balazs' proposal is that the points should read like this:
Balazs, I get it that you are not opposed to the VET in principle, but you are in the paper. Let's try to make the points be what we would actually put in the paper, verbatim.
So far then, the major disagreement with regards to the points is to remove the VET from the paper or not. I propose that folks either weigh in on that binary question or propose a different version of the 9 points. Personally I'm in favor of including the VET, so Balazs and I cancel out. Tiebreakers?
My vote is with Balazs.
Tim
On Sat, Jan 19, 2013 at 1:07 PM, Stephen Larson notifications@github.comwrote:
OK -- to be clear, my interpretation of Balazs' proposal is that the points should read like this:
- The principal test for the Open Worm model's validity is called the Behavioural Turing Test.
- The Behavioural Turing Test is defined as satisfying the following hypothetical scenario: There are two identical screens both showing a C.elegans free to move in the same environment. The video on the screen is filtered so that no visual clues can hint which one of the two scenes is real. The tester can interact with both scenes through an identical interface. Using the interface it is possible for the tester to influence the environment in different ways agreed upfront according to individual test goals. A full spectrum of possibility is allowed, from a non interactive scenario to a fully interactive one (changes of terrain, light, food, mates, etc. allowed). If the tester cannot distinguish which one of the two screen is showing an in vivo c.elegans and which one is showing a simulation then the program that runs the simulation passes the Behavioural Turing Test. The more interactions will be allowed the stronger the test validity will be.
- Because constraints can be changed in a simulation, there are not one, but many possible Behavioural Turing Tests - a model may pass with one set of constraints but not another.
- Our initial test will be the Null Environment Test The Null Environment Test is defined as a test of the simulation where the environment has been made as featureless as possible, and no interactivity is allowed at t>0 in the simulation. The Null Environment Test is therefore a special subset of the Behavioural Turing Test.
- The Fundamental Question we are interested in is - "What is the minimum amount of biological realism required to make our simulation of a c. Elegans pass at least one Behavioural Turing Tests?". To emphasise this, we will refer to it as the Fundamental Question.
Balazs, I get it that you are not opposed to the VET in principle, but you are in the paper. Let's try to make the points be what we would actually put in the paper, verbatim.
So far then, the major disagreement with regards to the points is to remove the VET from the paper or not. I propose that folks either weigh in on that binary question or propose a different version of the 9 points. Personally I'm in favor of including the VET, so Balazs and I cancel out. Tiebreakers?
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12461197.
if the contents of the paper reflect that 1-5 summary in @slarson's post, I would be over the moon. It's really looking great
I am not opposed to the VET, as I am not opposed to testing in general. However, if I am also not opposed to mentioning that we will "test" our simulation, I don't think we should include a detailed description of our planned testing strategy in the paper, so yeah - my vote is with Balazs too.
I am strongly in favour of including VET in the paper we're working on.
I believe that without it we will be open to criticism and have insufficient content.
@vellamike We are proposing a Behavioural Turing Test, why should people criticize the fact that we don't say how we are gonna test out our system before we try to pass the BTT? Again, I think it's a good idea and it's crucial to have a VET, I just don't believe it falls in the scope of the BTT concept we are trying to present in this paper. If we already had something I could possibly agree with this - but since it's work of speculation for a validation engine for a system that does not exist, I do not.
@JohnIdol Because
@Interintel - I think what we are struggling with is the definitions of "test" and "interactivity". From my perspective, mutagenesis would be a form of interactivity, and the expectation value of the result of mutagenesis can be used to fail or pass a test.
Regarding the simplification of the model step. By "simplify model" I mean reduce the algorithmic and mathematical complexity required to pass the test. If two models pass a test, then the simpler one should be considered to be superior, at least in its capacity to pass that test. The more complex model may have greater predictive capacity - but a new test should be designed to prove this.
I have refined the nine points to in light of previous discussions, and for clarity:
@vellamike
To be honest, personal preference aside, I am happy either way as long all the other points on the summary (1-5) represent the bulk of the paper.
@JohnIdol
Regarding the first comment, I don't think that it's "just to fill out space" at all...
@vellamike
@JohnIdol Regarding your second point, by "both of them will be real" I mean that it will be possible to pass this test without any simulation at all - just a video of another c Elegans will do. There is nothing in the NET which prevents this. Remember there is no interactivity - it's not a Turing Test. This is not a real concern per se - It's just showing the fundamental theoretical weakness of the test.
Regarding your third point, I actually think we can include some rigorous content, if we can't then we should leave it out. I have already started to include some content on this in the paper - including some targets for validation and a cost function for comparison (between simulation and experiment), I would appreciate your thoughts on it.
@vellamike
About the video thing: well in that case we should add some conjecture of post-pass verification that prevents that kind of cheating! The lack of interaction in the NET generates this kind of "hole" in our definition, but it would be effectively like putting another human at the other end of the text chat for the original turing test. If we remove this kind of ways to cheat it's not so trivial after all I think :)
About "I actually think we can include some rigorous content, if we can't then we should leave it out": I honestly think all we need to say is that OpenWorm aims to pass the BTT (whatever version) using a given level of biological details and rigorously describe that, rather than describe a battery of internal tests (VET). Said so I will read up your edits and we can discuss during the meeting.
Again, as long as the other stuff is in I am happy so it's really no big deal if people think it would be a valuable addition in there - I was under the impression we were struggling to keep it short already though.
...btw up to this moment still no one proposed any scheme for VET... i am against including it on practical grounds, but before any meaningful debate could take place we should know what VET is going to be IMO
2013/1/21 Giovanni Idili notifications@github.com
@vellamike https://github.com/vellamike
About the video thing: well in that case we should add some conjecture of post-pass verification that prevents that kind of cheating! The lack of interaction generates this kind of "hole" in our definition, but it would be effectively like putting another human at the other end of the text chat for the original turing test. If we remove this kind of ways to cheat it's not so trivial after all I think :)
About "I actually think we can include some rigorous content, if we can't then we should leave it out": I honestly think all we need to say is that OpenWorm aims to pass the BTT (whatever version) using a given level of biological details and rigorously describe that, rather than describe a battery of internal tests (VET). Said so I will read up your edits and we can discuss during the meeting.
Again, as long as the other stuff is in I am happy so it's really no big deal if people think it would be a valuable addition in there - I was under the impression we were struggling to keep it short already though.
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12502899.
@balazs1987 I've already included some detail in the document.
sorry, I have missed that. i will go through it quickly now
2013/1/21 Mike Vella notifications@github.com
@balazs1987 https://github.com/balazs1987 I've already included some detail in the document.
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12503402.
Okay -- at the top of the meeting then, it appears that folks current positions are: include VET: Me, Mike -- Don't include VET, Balazs, Tim, Giovanni. So unless there are other votes, I'd say let's let democracy decide here -- we won't include it but I hope we can compromise on some sentences that would preclude it entirely in the current manuscript.
@slarson btw - I am not opposed to having a couple of sentences saying we will test our system with a battery of automated tests, but I think that defining the VET is a humongous amount of work and other than being mentioned is out of scope for this.
To make a point about game programming, click here: http://www.codeskulptor.org/#user8-CwzS8osFwA-0.py and then click the Run ">" button in the upper left. If you click on the worm, it will back up (sometimes diappears but that's due to the poor programming).
This is very, very crude game programming but if I wanted too, I could make this work like this video: http://www.youtube.com/watch?v=olrkWpCqVCE
I created the sprite that simulates the worm movement in the "game" from the video above.
What I don't want do, is to have someone say all we did was create a video game like above. You can see how simple the Python code is in this example and it would not take much to make it fully interactive and work much smoother.
Tim On Mon, Jan 21, 2013 at 8:08 AM, Giovanni Idili notifications@github.comwrote:
@slarson https://github.com/slarson btw - I am not opposed to having a couple of sentences saying we will test our system with a battery of automated tests, but I think that defining the VET is a humongous amount of work and other than being mentioned is out of scope for this.
— Reply to this email directly or view it on GitHubhttps://github.com/openworm/OpenWorm/issues/40#issuecomment-12504186.
We've reached a reasonable consensus about this -- from here out let's continue adding comments to the manuscript directly
We need to come to some consensus on basic features of the Turing test paper. I have tried to spell it out here in the issue. I have marked with first initial where I think some of you fall, but please correct if this is not accurate.
G - @JohnIdol T - @Interintel A - @a-palyanov SK - @skhayrulin MV - @vellamike P - @pgleeson SL - @slarson B - @balazs1987 - MC - @tarelli