serratus / quaggaJS

An advanced barcode-scanner written in JavaScript
https://serratus.github.io/quaggaJS/
MIT License
5.07k stars 979 forks source link

API Redesign #109

Open serratus opened 8 years ago

serratus commented 8 years ago

As most of you have already noticed, the API of Quagga is neither intuitive nor easy to use. Within this issue I would like you to discuss with me ideas for a new API. The new design should be readable, easy to use and extensible for features to come.

I'm very interested in your suggestions and ideas, so please help me make this API better for everyone. Tell me your current pain-points and let's discuss them in this thread. Based on your feedback and my time available, I'll push a draft of the API in the coming days. Feel free to collaborate.

Requirements

As it is currently under implementation

Creating a default video-based scanner

Creates a scanner with default settings and the user's camera:

Quagga
  .fromVideo({constraints: {facingMode: "environment"}})
  .addEventListener('detected', (result) => (console.log(result)));

Creating a default image-based scanner

Creates a scanner that is suitable for static images

Quagga
  .fromImage('../test/fixtures/code_39/image-001.jpg', {size: 800})
  .addEventListener('processed', (result) => (console.log(result)));

When scanning images, the consumer either expects a result, or an error (no result). Hence, returning a Promise would be handy:

Quagga
  .fromImage('../test/fixtures/code_39/image-001.jpg', {size: 800})
  .toPromise()
  .then((result) => {
    console.log(result);
  }).catch((result) => {
    console.log("not found!");
  });

Passing configuration

In most cases, the default configuration is not sufficient and can be adjusted prior to creation:

const customScanner = Quagga
  .decoder({readers: ['ean_reader']})
  .locator({patchSize: 'medium'});

or pass in the entire configuration at once:

const customScanner = Quagga
  .config({decoder: {readers: ['ean_reader']}, locator: {patchSize: 'medium'}});

Starting/Stopping

The general idea is, that everything should be as lazy as possible.

Consumer based

Meaning that the scanner should not start automatically without an event-listener attached. Therefore the processing is started/paused based on the state of the event-listeners. As long as there is a consumer attached, the scanner is active. It should automatically stop/pause when the last consumer is detached via removeEventListener(eventName, callback).

Manually

Another approach would be to start/stop the scanner manually by calling the appropriate methods provided.

RxJS

One reason for redesigning the whole API is the desired move towards a more reactive implementation. With RxJS one could simply write the following code to scan Code 128 and EAN-13 barcodes and react only to the results of interest:

// creating a stream listening on the "processed" events
const processStream = Quagga
  .decoder({readers: ['ean_reader', 'code_128_reader']})
  .fromVideo({constraints: {facingMode: "environment"}})
  .observe('processed');

// create a stream only containing the label and format of a detected barcode
const detectedCodes = processStream
  .filter(result => result && result.codeResult && result.codeResult.code)
  .map(result => ({label: result.codeResult.code, format: result.codeResult.format});

// create a stream containing all boxes which are detected during scanning
const detecedBoxes = processStream
  .filter(result => result.boxes && result.boxes.length > 0)
  .map(result => result.boxes);

After creating these streams, one can listen to them:

detectedCodes.subscribe(code => console.log(code));
detecedBoxes.subscribe(boxes => draw(boxes));
JauernigIT commented 8 years ago

I love the RxJS approach/integration. This definitely opens up the API to a flexible set of use cases. I would love to see an option to get the (image) results of different proessing steps, e.g. the Otsu-binarized image (which is not possible at the moment).

From a configuration perspective, it would be really nice to have it more modular, e.g. .decoder(), .locator(), .debug(), ... instead of having one big configuration object. This makes it clearer and understandable in terms of SRP.

For starting/stopping I would go the manual approach, because it makes the behavior more explicit. Btw: the fluent API approach is great.

zpappa commented 8 years ago

+1 to the RxJS approach, but would appreciate ability to use native node streams for highland support.

serratus commented 8 years ago

Thanks for this valuable feedback. I'm currently working on a separate branch issue/109 in case you want to follow the progress. See api-test.js and quagga.js to get an idea of how the API is evolving.

Interesting idea of exposing the different processing steps. I'll keep that in mind. My worries are, that due to the use of web-workers this might be really challenging. Sending data back & forth between all the processing steps seems to me like a waste of valuable processing time. How would you deal with that?

Another thing worth mentioning is the interaction with the new API. Basically, it's separated in three distinct steps:

  1. configuration: chaining: decoder, locator, config together
  2. instantiation: calling fromImage, fromVideo to create an actual instance that can be controlled with start, stop, addEventListener, removeEventListener
  3. Calling toPromise or observe whould return kind of result objects.

In the first step, every function call returns a new object so that common configuration can be shared among multiple instances before actually creating them. In the second step, an actual instance is created. All methods on this instance return a reference to itself to allow chaining calls, with the exception of toPromise and observe.

I hope this makes things clearer. BTW, I'm not yet sure how to get RxJS into Quagga. A hard dependency? Peer dependency? Or even a separate package like quagga-rx that depends on both, quagga and rxjs? Another idea was to completely omit the dependency on rxjs since I've already prepared the API to have functions like addEventListener and removeEventListener which can be easily used to create your own streams by calling Rx.Observable.fromEvent(scannerInstance, 'processed'). What do you think?

JauernigIT commented 8 years ago

I'm not certain about having RxJS as hard dependency. I don't like dependencies, since they also influence your consumers. From my perspective, having an event-based approach without RxJS would be a good fundament. Then a second package quagga-rx would be really nice, which essentially is a wrapper around quagga and provides observables via Rx.

I can't say anything about the web workers and how to get the information out of it. Perhaps it makes sense to have a "debug mode" that exposes the partial results and behaves different to the normal non-debug-mode, which is performance-optimized to return only the final result. Alternatively, when in debug mode, the partial results are attached to the final result, thus the consumer has access to them after the processing is done. From a user's perspective, this would be absolutely sufficient.

serratus commented 8 years ago

I'm totally with you on the topic of dependencies. Then it's settled, no additional dependency for this kind of feature, but a separate package for people who would like to use it with Rx or any other reactive library.

Alternatively, when in debug mode, the partial results are attached to the final result, thus the consumer has access to them after the processing is done

This would eat up a lot of memory. All of the processing is done in-place, meaning that every step overwrites the changes of the previous one.

Perhaps it makes sense to have a "debug mode" that exposes the partial results and behaves different to the normal non-debug-mode, which is performance-optimized to return only the final result.

Sounds more reasonable, but this also means that it would only work with web-workers with some limitations on the consuming part. Without web-workers, this wouldn't be a problem. The core of the problem lies within the inability to share memory between workers. Either the UI-thread has access, or the web-worker, but never at the same time. This means that, if web-workers should be allowed in this "debug mode", the consumer has to signal when it's been done accessing the memory to throw it back into the pool.

In short:

JauernigIT commented 8 years ago

This would eat up a lot of memory. All of the processing is done in-place, meaning that every step overwrites the changes of the previous one.

I just thought to enable this only in debug mode. So normally (debug mode "off"), no memory/performance is eaten, but when a user wants to have the partial results attached (debug mode "on"), then the result object is blown up with copies of the partial step results.

Without web-workers, this wouldn't be a problem.

From a debugging perspective, this seems to be fine. Normally I just want to see what Quagga uses as partial results and if those partial results can be finetuned to improve recognition. From a broader perspective, this could be a problem. You know, in scenarios where you want to use a partial result after Quagga has processed the image. In my case, this would be handling the binarized image over to a QR code scanner, when Quagga has not detected a barcode. Would be great to see this scenario becoming real. Again: if a user could opt-in for different kinds of partial results in the final result, this would be a great and flexible addition, even if it takes more memory (but it's opt-in, to it normally would have no impact at all).

MichaelSL commented 8 years ago

@serratus, You are already shipping some recognition params with the result: I suppose it will be ok to have some object with sequence of steps in debug mode. I'm not sure about the format: @JauernigIT, is it ok for you to have the image bytes?

serratus commented 8 years ago

Alright, so as far as I understand, the ability get access to the underlying image throughout the different phases is of interest.

Use Cases?

What are the advantages of having access to the image in each processing-step? Please help me to understand why this is important to you.

First of all, how should the API cope with that requirement? Two proposals:

addEventListener

Through the already existing method addEventListener

videoScanner
  .addEventListener('binary', image => {console.log(image.size);});

configuration

Another approach would be to define hooks as part of the configuration:

code128Scanner
  .decoder({
    readers: ['code_128_reader'], 
    interceptors: {
      binary: image => {console.log(image.size);}
    }
})

Which format?

We also need to agree on the format of the image that's passed into the interceptor-functions. The binary data is most likely to be represented as Uint8Array, but what about the meta-data such as dimensions? Currently, this is all wrapped inside a constructor function ImageWrapper which holds size and data in one place but also comes with additional methods.

Immutable?

Should it be possible to mutate the data that's being intercepted? This allows for additional processing if required, but could also harm the entire pipeline.

Asynchronous?

In case of web-workers, or possible mutations, should the code be executed asynchronously? A callback function could be used to signal that the processing is completed, like so:

function onIntercept(image, next) {
  doSomethingFancy();
  next(image);
}

Web-Workers

Again, this is a very delicate topic since it's deeply integrated into Quagga. One must be aware that access to the image is revoked as soon as the function is done executing. If you want, you could copy the data to your own buffer in one of the interceptors to avoid tight coupling.

This interceptor feature is of much lower priority than the rest of the API rework, because for this to work, a lot of internals must be adapted.

JauernigIT commented 8 years ago

Use Cases

My (personal) use case is that I want to have access to the results of the image processing steps in the onProcessed() callback, thus when the processing is done. I want to hand the processed image over to another (QR code) scanning library without having to pre-process it again on my own.

So, for me it would be fine to just have access to the image processing results in onProcessed(), similar to Quagga.canvas.ctx.* or Quagga.canvas.dom.* as it is today. This perhaps would not be too difficult, since you don't have to implement explicit hooks that change the processing flow.

Interceptors

Intercepting the image processing I would see as "advanced feature". Yes, it probably would be nice to intercept the processing steps and enhance the image with some custom image processing. But for me that's another use case and should be treated as such.

jorgepinon commented 7 years ago

@serratus, In case the new API is still being worked on, I'm interested in the comment you made above about supporting multiple instances. In my use case, I'm trying to have 2 image inputs in one form. Unless I'm wrong, I don't think there's a supported way to do this in the current release.

serratus commented 7 years ago

@jorgepinon Thanks for reviving this old thread. The sad fact is, I'm still working on a new API, but it looks different from the one proposed above. I really try to decouple the image-source from quagga and make it more adaptable for different sources of image data. Multiple images at once? What would be the use-case? Just to clarify, in the next version of quagga it will be easily possible to run multiple instances of quagga at the same time. I'll keep you posted.

jorgepinon commented 7 years ago

We have an app that needs to scan barcodes from 2 different devices in order to associate them in the database. I'm sure multiple quaggas is a rare use case.

We got it working with a wizard-style single form, and I was able to detach the event listeners for quagga and reattach when the user moves to the next step. I got the idea from one of your examples.

vinneyk commented 7 years ago

I'm with @jorgepinon on using multiple instances of Quagga scanners. I'm developing an app that involves two scans: one to register the employee and the other to scan their output. The scanner instances don't need to (in fact probably shouldn't) be running simultaneously but it seems like unnecessary overhead to have to init/destroy the single instance each time the user switches fields. I'm still early in the development of this feature so I'll try to report back with a summary of my experience getting it started.

The fact that this library does what it does, however, is so beautiful. Thank you @serratus for all you've done.

seanpoulter commented 7 years ago

It seems @JauernigIT and I have a similar interest in getting the processed image but it seems the only way to use Quagga to get at it right now is using the ResultCollector to intercept the image data before it is massaged to publish as an event for Quagga.onDetected. Have you considered changing the API for the results? Would there be a performance hit if the imageData was set to the onDetected callback and not consumed?

FYI @JauernigIT, I've replaced using Quagga.onDetected with Quagga.registerResultCollector({ addResult: (imageData, canvasSize, resultCode) => {...} }).