Open sdikby opened 7 years ago
@sdikby have you tried hipi hibImport.sh with millions of images successfully?
@yangboz sorry for the delay. no, i didn't even start to use HIPI. My use case is to process millions of images in hadoop. But i don't think that it is performant enough with MapReduce or if it is even possible with Hipi, as it is not maintained since around a year now (the last commit was on 12 april).
@sdikby thanks for your reply, totally agree with your comments of lack of updates on HIPI source code, also I found code issue #30, none response.. by the way, except the HIPI solution, any other Hadoop sequence file solutions for millions of images files?
@yangboz i know some 2 other tools for image processing, but i didn't try them yet (i just began my master thesis :) ) there is Mipr: https://github.com/sozykin/mipr and this one: https://github.com/okstate-robotics/hipl The two are based also on mapreduce. Otherwise i don't know the ´difference between them. Feel free to test them and i would be happy to get a feedback from you
@sdikby thanks for your ideas suggestion, I will try them, and my ideas comes from : http://dinesh-malav.blogspot.com/2015/05/image-processing-using-opencv-on-hadoop.html , It is a great tutorial on CDH(MR1)+HIPI v1+ant, but nowadays,HIPI using gradlew, version v2+,that's why I am struggling on code base modifications.
@yangboz it would be also great to know how the 3 tools/frameworks store images on hdfs (to deal with the blocksize problem for example) and the big differences between them(read/write performance from/into hdfs).
@sdikby before those 3 tools/framework, existed solutions that I have studied on Ceph and even Cassandra image blob storage. Conclusion will coming soon.
@sdikby compare Mipr: https://github.com/sozykin/mipr (full documentation an code example passed) with this one: https://github.com/okstate-robotics/hipl (missing of documentation!)
@yangboz oh good job ! and what's about performance? did you compare the both in terms of # image write/read per second? and how they both store images on HDFS, specially how they deal with the block size problem ?
@sdikby there is a paper(please drop a letter to me if you need it.) on hadoop/spark performance compare includes indexing and retrieval according to its compare result, integrate hadoop and spark to process 160k pictures on 30 node cluster that improve the efficiency.
@yangboz could you please provide me this paper. I would do a performance test between the 3 cited frameworks in the next months.
Dear HIPI developers,
do you plan on integrating apache spark instead of the old mapreduce?? if so when? Otherwise could you give me some hints on how to do it? My use case is that i need to classify millions of images and with mapreduce it will not be efficient as i need it to be. @sweeneychris @liuliu @voigtlandier @zverham @hafnium