Reduce query time for NYC Citi Bike forecast

groovenauts / QueryItSmart

QueryIt Smart is the demonstration application for BigQuery & Cloud Machine Learning.

MIT License

86 stars 16 forks source link

Reduce query time for NYC Citi Bike forecast #49

Closed nagachika closed 7 years ago

nagachika commented 7 years ago

Now the query for demand forecast takes about 50-70sec. I'll try to compress the MLP model using "distilling" technique.

https://arxiv.org/abs/1503.02531

kazunori279 commented 7 years ago

Also, Nakahara-san mentioned at TFUG that MLP could be compressed to 1/10th by using pruning, quantization and huffman coding. FYI.

See p42 of this slide: https://www.slideshare.net/HirokiNakahara1/cnn-on-fpgagpu?ref=https://tfug-tokyo.connpass.com/event/49668/presentation/

The paper: https://arxiv.org/abs/1510.00149

But I wonder if the query time is proportional to the size of the model. It may also be affected by the data size (bigger the faster, as we saw on the document search) and computation time.

nagachika commented 7 years ago

It may also be affected by the data size (bigger the faster, as we saw on the document search) and computation time.

You are right. I created the table with 1 MB STRING dummy column and include the column in the query, it finished in about 12 sec. The bytes processed and bytes billed went to higher instead.

kazunori279 commented 7 years ago

Wow, that's interesting but awesome result. Now we have all the demos running in 20 secs :)

nagachika commented 7 years ago

I applied the technique with stub column at https://github.com/groovenauts/QueryItSmart/commit/322129716aa2a48433c56e5bcb2c047d550702fd.

The more sophisticated techniques to compress model is also interesting. Thank you for your information! But for this demos, the brute force way is easy and effective...