tieubao / til

Today I Learned. These are what I've learned everyday, organized. #til.
57 stars 8 forks source link

Body language and machine learning #514

Open tieubao opened 4 years ago

tieubao commented 4 years ago

https://statmodeling.stat.columbia.edu/2020/10/25/body-language-and-machine-learning/

Riding on the street, I can usually tell what cars in front of me are going to do, based on their “body language”: how they are positioning themselves in their lane. I don’t know that I could quite articulate what the rules are, but I can tell what’s going on, and I know that I can tell because I make predictions in my mind which are then confirmed by what the cars then actually do. (Yes, there could be selection bias, so if I really wanted to check for sure, I should record my guesses and check the error rate. Whatever.)

Anyway, the other day I was thinking about how this is an example of machine learning. No causal inference (sorry, Judea!), just pure prediction, but “machine learning” in that my brain has passively gathering data on car positioning for the past few decades, and at some point it decided to associate that with driving decisions. I guess it was motivated by me trying to figure out where to go in particular situations. So in many ways this is exactly the kind of problem we’ve been hearing about in discussions of artificial intelligence, with the usual steps:

  1. Open-ended data gathering (“big data”),

  2. Unsupervised learning with undefined categories (that would be “cluster analysis”),

  3. Supervised learning with defined categories once I become conscious of the categorization that I’ve been doing passively until then,

  4. Refinement: Once I’m aware of the parameters of this inference process, I can use more active processes to flag the misclassifications and ambiguous predictions and use these to refine my predictions (“diagnostics” and “evaluation”).

I’ve read about these steps in other problems, from image identification to crime detection. But somehow it all becomes more real to me in the context of this everyday example. In particular, I’m aware of the different steps, from passive data collection to the unconscious identification of a pattern through conscious use and refinement of the procedure.

It also strikes me that there is an analogy between consciousness (for humans and animals) and, hmmm, I don’t know what to call it . . . maybe “active programming” in machine learning.

Let me put it another way. Statistical methods when constructed are entirely conscious: design, measurement, data collection, inference: these are all problems that the user must choose to solve. Certain statistical procedures have been automated enough that they could be applied unconsciously: for example, a computer could compute correlations between all pairs of variables, look at distributions and scan for outliers, etc., in the same way that human or animal visual systems can find anomalies without the conscious choice to look for them.

Machine learning is a little different. There are lots of conscious machine learning procedures—various nonparametric algorithms for prediction, classification, inference, decision making, etc.: basically, these are statistical methods, but maybe we call them “machine learning” because they are new, or because they are nonparametric, or because they’ve been developed by computer scientists rather than statisticians, or because they work with big data, etc. But machine learning and AI are also associated with automatic or background or unconscious processes, such as processing of big data without specific goals in mind (sure, you could argue that projects such as the General Social Survey have this feel to them too) or looking for patterns in the background such as my brain did with the car positioning problem.