lingjzhu / mtracker.github.io

MTracker is a tool for automatic splining tongue shapes in ultrasound images by harnessing the power of deep convolutional neural networks. It is developed at the Department of Linguistics, University of Michigan to address the need of splining a large-scale ultrasound project.
19 stars 6 forks source link

Web app from your CNN #3

Open stutrek opened 3 years ago

stutrek commented 3 years ago

I used your CNN to make something that could potentially show a live tongue shape from an ultrasound video stream. My wife, @heatherkoff, is an SLP researching ultrasound at NYU and attended UltraFest IX. Your talk was fortunately timed as we were in the car and, since I had been suggesting she try CNNs, she put your talk on the speakers. I am a software developer, I know nothing about the speech field.

It was relatively easy to get the CNN working in a docker container. On my machine I quickly ran into versioning issues that you seemed to hint at in your talk, so I went right for docker.

The system is a tiny website that allows you to select a window containing an ultrasound image/video, like you would share a window in a video chat. Frames are sent to a small web server that saves the image, runs your python, and sends the result back. Currently it's too slow to be useful, but if it could keep your python running it wouldn't have to pay the price of initializing the CNN for every image.

https://github.com/stutrek/live-tongue-shape

Hope you like it!

lingjzhu commented 3 years ago

Hi @stutrek Thanks a lot for listening to my talk and the website! I really appreciate your effort! Yes, I understand that setting up an environment is certainly hard. The deep learning packages used in this repo have been updated a lot as deep learning is still a rapidly developing field. Docker is a good solution to this.

I think that we might not need to re-initialize the model for every frame. Track_frames.py can process all images in a folder if all frames to be processed are placed in the same folder.

This model still has a problem. It does not work very well for images collected from other ultrasound machines, since neural networks are highly data-dependent. If you and your wife are considering using this for research, re-training the model could potentially make it more accurate. (I am still looking for additional data to improve the model but failed to find any open source datasets.)

(To be continued. )

stutrek commented 3 years ago

My wife has data from several machines, but none of the data is hers, she would need to get permission to share it. I don't know if it's enough data to train a model. Would it help if there was a step where the user cropped the image and applied brightness/contrast settings?

I was imagining a system where a patient could see their tongue shape in real time, with an overlay of an ideal tongue shape for the patient to match. With that use case it's not a big deal if one frame in 10 is wonky, as long as most are pretty good. I'm not a researcher, so I have no purpose for this, other than thinking it's neat. My hope is that someone doing clinical research could either take inspiration or use it as a starting point.

Last night I tried to get IPC working between the node server and python to see if it would be faster. I copied your file that works on a set of images and made it listen to stdin for an image path. It errors, and it errors slowly. It's very hard for me to debug for three reasons: it only runs in docker, and it's hard to use print because stdout is consumed by the node process, and I don't know python. Do you think this could be fast enough to handle ~10 frames per second?

https://github.com/stutrek/live-tongue-shape/blob/use-ipc/mtracker.github.io/track_stdio.py

lingjzhu commented 3 years ago

@stutrek Sorry for the late reply! With multiple deadlines pending, I might be slow to respond recently. Apart from all other factors, neural networks are slow to run unless we use a GPU to parallelize it. Based on our test, the dense U-net runs about 2-4 frames on an i-5 cpu while the U-net runs a bit faster, about 6 frames per second. So I think 10 frames per second might not be achievable on a cpu. With a GPU, the speed can increase to 30~50 frames per second.

Looking at the code, I think it looks OK. If we can avoid loading the model for every image, the speed should improve. I can test the code when I have more time (this or next weekend). I am not a professional programmer so admittedly my code might contain some problems.

For training, neural networks are pretty data-hungry. If we have more than 5000 images with labeled tongue contours, a reasonably good model can be trained. Yes, we do use random cropping and adjusting some brightness settings to boost the effective samples of the data. They might help but not a lot.

stutrek commented 3 years ago

At 30-50 frames per second the network will be a bottleneck!

Do you know if the model can work with tensorflow.js? With that GPU acceleration comes for free on almost any system without any installation. I looked at it briefly when I first started playing with your code, but once I started reading about converting the model I realized how much I don't know. For me, tensorflow.js would be ideal because I can build an app easily, but getting python set up and converting the model is hard.

In terms of cropping, if it's in an app, there could be a step that allows the user to crop and adjust the image/video so that the data coming to the model is clean.

If there was a function that took a single cropped image and returned a two dimensional array of data (these lines?), it would be fairly easy to integrate with. My thought is to make a file that imports that function and calls it with a new image. It seems logical to have a python server instead of a node one, but I was just working with what I'm familiar with.

lingjzhu commented 3 years ago

@stutrek Thanks for the great idea! I take a look at tensorflow.js and think it might be possible to migrate the model to that (I wish I had known JS better!). Yes, the lines you highlighted can return a line in x-y coordinates. I wish I could be more helpful and, if you don't mind, I think it might be more helpful if we can talk about the idea, which might be more efficient than relying on Github over weeks. I have a pending deadline on 23rd. I definitely will get back to you immediately after that (with my apology).

stutrek commented 3 years ago

That sounds good. I'm in no hurry, this isn't a project with deadlines for me, it's just a neat thing to make.

I know you can convert models, I got as far as looking at this page and reading some stack overflows. At the time I assumed your model relied on python code so I stopped looking.

There are some new browser APIs for file system access I've wanted to play with, so I have plenty else to do

lingjzhu commented 3 years ago

Actually the trained model itself does not require Python code. It is in the .hdf5 format, which is the standard format for Keras models. I believe it is easy to convert it to be compatible with tf.js using the following method. I'll take a look at this to see if it is feasible.

stutrek commented 3 years ago

Hey, I want to share some progress I made on the front end. There are some new browser APIs that I am happy I got to play with. The back end is still the slowest docker version.

The app now lets you load in a folder of images and traces each one. It saves the file handles and all data from analyses so the next time you open the app it still has all your data. You do need to grant it permission for it to see the files, but that's a single button press.

image

in this image, video-frames is a folder on my computer

It has the size of the image hard coded, and there's no way to make a custom crop, it uses the crop that you built into python. The add video button doesn't work, the video wouldn't load, I think because of a bug in Chrome. The Native File System API is very new, so that's expected. I was imagining something that would let you select specific frames to trace, I'm not sure what the best UI is.

I do want to talk about this, I have some time over the holidays since we aren't traveling this year. Since I'm only doing this for fun, and realistically this isn't going to stay fun forever, I want to get it to a place that it can be shared and iterated on by you or other people in the community.

lingjzhu commented 3 years ago

@stutrek This looks amazing! I am sorry that I should have followed up earlier. I fell ill this week and took time off. I understand this is a lot of work and appreciate that. I should contribute my part to it. I do not know Java nor JS, but I think I could debug the model. Let me play with your code first.

If you think talking is a good idea, would you mind if I reach you by email? Perhaps we can set up some time for it? Thanks!

stutrek commented 3 years ago

Yes, email me. My email is on my github profile.

All the code I wrote is TypeScript, which is JavaScript with types. The app only works in Chrome because it uses a new browser API for loading files.

Here are some of the tools I used:

Right now I don't think it's worth looking at how it sends data to the server over the socket. The socket and the queue are entangled, it will have to be rewritten for manual crop, and changed again for tensorflow.js.