API for continuous vs. one-time body/hand/face detection

shiffman commented 1 year ago

I'm opening this thread to discuss aligning the API across all keypoint detection models (currently planned: hand, body, face). This relates to #21.

The hand model implemented by @ziyuan-linn uses an event callback with handpose.on("hand", gotHands); I like this pattern and it would make sense to adopt it for body pose and face keypoint tracking. However, should we also provide a function for a "one-time" hand detection in an image? Here's one idea:

Continuous

// continuous if a video element is passed into handpose() and on() is used? 
// Or should video be passed into on()?
let handpose = ml5.handpose(video, modelReady);
handpose.on("hand", gotHands);
function modelReady() {
  console.log("model ready");
}
function gotHands(results) {
  console.log(results);
}

One-time

// No media is passed in when loading the model
let handpose = ml5.handpose(modelReady);
function modelReady() {
  console.log("model ready");
  // Individual image passed in?
  handpose.detect(img, gotResults);
}
function gotHands(results) {
  console.log(results);
}

This brings up two other key questions:

Support asnyc/await?

Should we support async and await or keep things simple and more aligned with p5.js callbacks only? The following syntax is so lovely, but not all how p5.js does things! cc @Qianqianye

async function runHandPose() {
  let handpose = await ml5.handpose();
  let results = await handpose.detect(img);
  console.log(results);
}

Error first callbacks?

The p5 style for callbacks is to have one object containing the results.

function gotHands(results) {
  console.log(results);
}

ml5.js has had "error first" callbacks for most of the models in prior versions:

function gotHands(error, results) {
  console.log(results);
}

My preference for beginners is the p5.js style (and include an optional second error argument for debugging). However, I believe ml5.js adopted the error first callback previously because (a) it is a JS convention and (b) it was required for us to be able to support both callbacks and async/await. @joeyklee tagging you in here, is my memory of this correct?

I might lean towards keeping things simple (a) not support asnyc/await (radical!) and therefore (b) not require error-first callbacks. But I'm not at all sure here, that async/await syntax is quite nice!

@MOQN, would love your thoughts as someone who has taught these models quite a bit.

limzykenneth commented 1 year ago

Really interesting ideas here, thanks @Qianqianye for pointing me to this issue.

I didn't manage to have a look at the underlying code for the detections but for me the main decisions will probably be around whether the underlying operations are already asynchronous (which I'm guessing they are), and if so they will probably need to either use callbacks or async/await promises.

The p5.js style syntax that is not very semantic and have certain tradeoffs, first is that p5.js have the preload() function where we abstract away the asynchronicity by delaying sketch initialization until all relevant async operations are resolved (if any one of them never resolves, the sketch never starts); the second is that using those function outside of preload() there will be a period where the returned value does not have the right data until the actual async operation is done.

As for callback style, error first callback is more of a node.js convention than a JS one and more recently JS (and node.js as well) are moving generally towards promise based syntax instead of callbacks. There are many frameworks/libraries that does not provide callback syntax and just provide promises nowadays (native fetch() for example have no callback syntax and just return a promise).

I think in general moving towards promise with async/await would be nice while the main hurdle will probably be having to understand promise and async in JS to a certain extend to be able to use them correctly, although same would be required for callbacks too.

ziyuan-linn commented 1 year ago

There are fascinating ideas!

Just want to provide some information about the current implementation and my thoughts:

Currently, for the continuous predictions, the results can only be accessed via the EventEmitter interface: handpose.on("hand", gotHands);.

The current implementation does support one-time predictions, and the results can be accessed via both callbacks and promises.

Callback:

let handpose = ml5.handpose(modelReady);
function modelReady() {
  console.log("model ready");
  handpose.predict(img, gotResults);
}
function gotHands(results) {
  console.log(results);
}

Promise:

let handpose = ml5.handpose(modelReady);
function modelReady() {
  console.log("model ready");
  handpose.predict(img).then(gotHands);
}
function gotHands(results) {
  console.log(results);
}

I personally think callbacks for the one-time prediction would make a little more sense for beginners since it is syntactically similar to the EventEmitter interface for the continuous prediction. However, also supporting promises would be nice, and I don't think it would be too much trouble to implement both.

shiffman commented 1 year ago

Thanks @limzykenneth and @ziyuan-linn this is a great discussion!

I think it makes sense (as suggested by @ziyuan-linn) to go with callback only for the event emitter interface (continuous predictions) and support callback + promise options for "one-time" predictions. It might also be nice to support promises for loading the model but that's a secondary consideration? @gohai perhaps we should organize a document outlining this pattern so that those working on other models can align the API?
In terms of "error first" callbacks, maybe we can drop that across the board, this was always a major source of confusion for students. What if the pattern were an optional second "error" argument?

// Examples will look like...
function gotHands(results) {
  console.log(results);
}

// But for debugging...
function gotHands(results, error) {
  if (error) {
    console.log(error);
    return;
  }
  console.log(results);
}

@ziyuan-linn It would also be great to support p5's preload() function! I just realized the current version of ml5 does this! The one caveat is that it wouldn't work for the continuous prediction unless we decide to assign the video element later? Would this be too weird?

let handpose, video;

function preload() {
  handpose = ml5.handpose();
}

function setup() {
  video = createCapture(VIDEO);
  handpose.on("hand", video, gotHands);
}

Regardless, I would love to make an example as follows:

let handpose, img;

function preload() {
  handpose = ml5.handpose();
  img = loadImage("file.jpg");
}

function setup() {
  handpose.detect(img, gotHands);
}

@limzykenneth this discussion can move elsewhere, but if p5.js were interested in dipping a toe in the async/await waters, I think a great place to start would be loadJSON(). I could imagine a future where an alternative to preload() is:

let data;
async function setup() {
  createCanvas(640, 240); 
  data = await loadJSON('data.json');
}

function draw() {

}

I don't love adding async to setup() but I think the await keyword is very intuitive. I'd be happy to work on this if I can be of help!

joeyklee commented 1 year ago

Hi @shiffman ! apologies for the late reply -- i recently switched to stop receiving github notification emails and have lost track of tagged messages 🙈

My preference for beginners is the p5.js style (and include an optional second error argument for debugging). However, I believe ml5.js adopted the error first callback previously because (a) it is a JS convention and (b) it was required for us to be able to support both callbacks and async/await. @joeyklee tagging you in here, is my memory of this correct?

Yes! your memory is correct. If I also remember correctly, we wanted to stick with error first callbacks + we were working on supporting async/await 🖖

ziyuan-linn commented 1 year ago

@shiffman Aftering thinking about it, this makes a lot of sense to me! For both continuous and one-time we initialize handpose in preload() and pass in the media in setup().

I'm now wondering if it would be more intuitive to drop the EventEmitter interface and create our own function detectContinuous().

One-time:

let handpose, img;

function preload() {
  handpose = ml5.handpose();
  img = loadImage("file.jpg");
}

function setup() {
  handpose.detect(img, gotHands);
}

Continuous:

let handpose, video;

function preload() {
  handpose = ml5.handpose();
}

function setup() {
  video = createCapture(VIDEO);
  handpose.detectContinuous(video, gotHands);
}

gohai commented 1 year ago

I only have a question to add at this point: do we know that preload() is working currently in tandem with ml5? (I am woefully unfamiliar how this function works under the hood - but trying to research it I bumped into https://github.com/processing/p5.js/issues/5677 by Linda.)

I believe if we wanted to explore this, we'd need to bring utils/p5PreloadHelper.js into the new repo.

shiffman commented 1 year ago

@gohai my memory is that our intention was to have preload() work with all ml5.js models, but it was not completed across the board and documented consistently. For teaching, I would always try to use preload() first and if I ran into issues, would redo the example a different way.

shiffman commented 1 year ago

@ziyuan-linn I really like this idea! I'm not 100% sure about detectContinuous(). One other idea is startDetect() or maybe startDetection()? There could also be a stopDetect()?

Just to brainstorm a bit more. . . addDetect(), addDetection(), detectListen(), detectAll(), detectEvent(), addDetectListener()? I think I like start and stop maybe? What do you all think?

let handpose, video;

function preload() {
  handpose = ml5.handpose();
}

function setup() {
  video = createCapture(VIDEO);
  handpose.startDetect(video, gotHands);
}

function mousePressed() {
  handpose.stopDetect(); 
}

ziyuan-linn commented 1 year ago

@shiffman I also like startDetect() and stopDetect()! This also provides a way to stop the continuous detection.

MOQN commented 1 year ago

I apologize for the delayed response to this thread! I overlooked it in hectic circumstances.

I really like the idea of including both continuous and one-time options in the library. These two methods have been beneficial for students in understanding how to utilize ML models with ml5.js. The handpose.on() pattern has been particularly intuitive for beginners who are learning JavaScript and becoming familiar with event listeners. Additionally, I've noticed that some students have gone beyond the basics and experimented with their own ideas, such as using images or implementing triggers for specific results. For those students, the one-time option and calling the function recursively have provided a clearer understanding of the processes.

I am absolutely excited about the addition of two new functions, startDetect() and stopDetect(), as they offer students various possibilities for them to experiment and learn. If the one-time function is named detect(), we could consider detectStart() and detectStop(), haha?

Even though the two functions will be implemented, it would be great if we still keep the function handpose.on(). In my experience, many students who used ml5.js attempted to incorporate socket.io or Firebase into their projects. The similarity to the .on() pattern used in other JS libraries and APIs will facilitate students' understanding.

Lastly, I love the idea of having the first parameter of the callback function as results. I haven't seen any students effectively utilizing error messages, and this change could simplify their learning experiences.

shiffman commented 1 year ago

Thanks for the thoughts @MOQN! I agree with everything you are saying. My one concern re: teaching is having two ways to do the same thing in the library (if there isn't a specific need for one over the other). It makes things a little trickier to maintain and balloons the reference/examples a bit. It seems like we're boiled down to three options:

on() event + single detect()
detectStart()/detectStop() + single detect()
Support both 1 + 2 in the library.

The advantage to (2) is the ability to use preload() in the example. The advantage to (1) is the familiar design pattern to other JS libraries / event listeners. The advantage to (3) is we don't have to pick just one right now!

I'm not sure how to decide! This feels important as it will ripple across many of the other elements of the library.

I agree about the "error-first" callback and am removing it from the neural network library (see #23)

gohai commented 1 year ago

I also agree with Moon's observations and suggestions. Perhaps only a slight preference for startDetect() and stopDetect() here (more akin to beginShape() and friends?)

MOQN commented 1 year ago

Thank you for your insights, @shiffman! I now understand the potential issues that having two ways to achieve the same functionality can lead to confusion and difficulty to organize references and examples. If I have to choose between three, I will go with the second one!

@gohai, I think it is a very valid point, Haha. I agree with you! :D

lindapaiste commented 1 year ago

I only have a question to add at this point: do we know that preload() is working currently in tandem with ml5? (I am woefully unfamiliar how this function works under the hood - but trying to research it I bumped into processing/p5.js#5677 by Linda.)

I believe if we wanted to explore this, we'd need to bring utils/p5PreloadHelper.js into the new repo.

My recollection is that preload works for most or possibly all models. The issue is that when p5 is used it is only possible to create the model using callbacks and the Promise/async syntax no longer works. This is the issue where we discovered it: https://github.com/ml5js/ml5-library/issues/873#issuecomment-1128198830

Thank you! You've uncovered some extremely weird behavior. I was looking at the pix2pix source code and it definitely returns a promise when there is no callback. I can confirm that calling ml5.pix2pix('models/edges2pikachu.pict') works as intended (returns a promise) when there is no p5. Somehow p5 causes it to return the instance directly.

This must have something to do with the way that we are registering the functions with p5 in order to support preloading when they are called inside preload(). That is not something that is specific to pix2pix, so I checked out some other models. It turns out that this is a problem everywhere!

I then dug really deep into the ml5 implementation of registering preloads and determined that it is not possible to handle this properly due to the issue that I raised in the p5 repo.

lindapaiste commented 8 months ago

@ziyuan-linn I really like this idea! I'm not 100% sure about detectContinuous(). One other idea is startDetect() or maybe startDetection()? There could also be a stopDetect()?

Just to brainstorm a bit more. . . addDetect(), addDetection(), detectListen(), detectAll(), detectEvent(), addDetectListener()? I think I like start and stop maybe? What do you all think?
let handpose, video;

function preload() {
  handpose = ml5.handpose();
}

function setup() {
  video = createCapture(VIDEO);
  handpose.startDetect(video, gotHands);
}

function mousePressed() {
  handpose.stopDetect(); 
}

We should consider the .detecting property to be a part of the public API. TBH I would call it .isDetecting. Or possibly we want to have a method .isDetecting() which returns the value of this.detecting.

Looking at @shiffman's example it seems obvious to me that a user is going to want to write something like this!

function mousePressed() {
  if (handpose.isDetecting()) {
    handpose.detectStop();
  } else {
    handpose.detectStart(image);
  }
}

shiffman commented 8 months ago

@ziyuan-linn I really like this idea! I'm not 100% sure about detectContinuous(). One other idea is startDetect() or maybe startDetection()? There could also be a stopDetect()?

detectContinuous() was an old idea that is no longer going to be used. All models going forward will have the following:

functionStart() -- running model continuously to a data source (e.g. video)
functionStop() -- stopping running model
function() -- running model once on data source (e.g. single image)

I'm using a general term b/c we'll have classifyStart() and classifyStop() instead of detectStart() and detectStop() for some models like the imageClassifier.

We should consider the .detecting property to be a part of the public API. TBH I would call it .isDetecting. Or possibly we want to have a method .isDetecting() which returns the value of this.detecting.

I'm game for exposing this property and having an isDetecting() or isClassifying() method? I will say that even though we have an example with "stop" and "start", 99 times out of 100, the user will only be using the start() methods. It's nice to offer the option to stop() but almost all interactive media demonstrations and projects run the model continuously.

shiffman commented 6 months ago

I'm poking through some issues and noticed that I think this is resolved! Ok to close? We can refined and open more specific issues about the API as well (for example, whether it makes sense to expose isDetecting()). Perhaps we can document the discussion and settled design patterns here in this thread in CONTRIBUTING.md or somewhere similarly appropriate?

ziyuan-linn commented 4 months ago

@shiffman I agree that this is resolved! I think we have a fairly standardized API across all models now, I can look into documenting it in CONTRIBUTING.md.

shiffman commented 4 months ago

Wonderful! We should keep CONTRIBUTING.md up-to-date, but we should also review the documentation as it stands in ml5-website-v02-docsify and help comment, edit, and clarify anything there! I believe @QuinnHe and @myrahsa are working quite a bit in the docs this week!

MOQN commented 4 months ago

Yes, we will keep and update the how-to-contribute.md and develop-contributor-notes.md pages. Please feel free to provide any feedback. We can discuss it in today's meeting.

ml5js / ml5-next-gen