Open kukuruza opened 9 years ago
Yeah, i agree with the idea that we first use faster rcnn to detect cars on our dataset, and then let human correct the detections. i have talked with several people working on similar tasks. They suggest to use amazon turk or find some professional company to label for us.
My friend introduce a company in Hongkong who label data for researchers. the payment is 50 rmb per hour
how much is that?
anyway, automatic {detection -> human pruning -> training -> detection} loop is scalable and publishable. It's in the existing papers (e.g. http://cvrc.ece.utexas.edu/Publications/tamersoy_avss2009.pdf) and industrial "miovision" uses this approach
But manual labelling I think is important for difficult conditions with dense traffic and so on
50 rmb per hour
what is rmb?
Yuan, in Chinese, 1 dollar = 6.2 rmb
yeah,you are right. this loop {detection -> human pruning -> training -> detection} is feasible for our project. we can check all the cameras, and group them into several groups, and train a general model for each group. in this way, we do not need to train 500 models(one model for one camera)
in this way, we do not need to train 500 models
Oh, that's totally so. We're doing that even for viola-jones. May be better to split models by e.g. time of the day, but for now, just 1 model for normal conditions.
per hour of what? hour of one person work? We used 8 hours to label 100 frames. That would be $50. Too much.
just per hour..no matter how many people works on it. for example, we need to label 5000 images. they will give an estimation of how many hours it costs. and then we pay them hours*50rmb. they said it cost 5s to bound a box around a car.
Then 50/6.2 dollars/hour / 3600 sec/hour * 5 sec/bbox * 10 bboxes/frame ~ 10 cents/frame. That's better, and similar to mech turk.
Alternative 2) from the first comment was chosen. Again, we only need to label reliable cars on every image, and have mask to hide other unlabelled foreground. I implemented usage of masks inside the network in https://github.com/kukuruza/py-faster-rcnn
I'm going move the conversation from gtalk to here.
@Lotuslisa wrote:
I have two ideas to do that automatically:
After any of these methods, human should look at the saved stuff, and remove bad.