intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
17 stars 3 forks source link

Streaming example release test #1084

Closed qiuxin2012 closed 5 years ago

qiuxin2012 commented 5 years ago
  1. ${SPARK_HOME}/bin/spark-submit-with-zoo.sh \
    --master ${MASTER} \
    --driver-memory 2g \
    --executor-memory 5g \
    --class com.intel.analytics.zoo.examples.streaming.objectdetection.ImagePathWriter \
    --streamingPath ${streamingPath} --imageSourcePath ${imageSourcePath}

    doesn't work

  2. StreamingObjectDetection won't detect the local streamingPath sometimes. In my test, 0.txt detected, but 1.txt skipped.

  3. Too many warnings.

    2019-05-21 14:14:02 WARN  ImageSetToSample$:66 - The ImageFeature doesn't contain targetKey label, ignoring it
    2019-05-21 14:14:02 INFO  StreamingObjectDetection$:99 - Read image file file:/home/xin/datasets/coco/val2017-small/000000438907.jpg
    2019-05-21 14:14:02 WARN  ImageSetToSample$:66 - The ImageFeature doesn't contain targetKey label, ignoring it
    2019-05-21 14:14:04 INFO  StreamingObjectDetection$:99 - Read image file file:/home/xin/datasets/coco/val2017-small/000000147518.jpg
    2019-05-21 14:14:04 WARN  ImageSetToSample$:66 - The ImageFeature doesn't contain targetKey label, ignoring it
    2019-05-21 14:14:04 INFO  StreamingObjectDetection$:99 - Read image file file:/home/xin/datasets/coco/val2017-small/000000438226.jpg
    2019-05-21 14:14:04 WARN  ImageSetToSample$:66 - The ImageFeature doesn't contain targetKey label, ignoring it

    When create ImageSetToSample, targetKeys is set by default, you should set targetKeys to empty array in prediction pipeline.

    def apply[T: ClassTag](inputKeys: Array[String] = Array(ImageFeature.imageTensor),
            targetKeys: Array[String] = Array(ImageFeature.label),
            sampleKey: String = ImageFeature.sample)
            (implicit ev: TensorNumeric[T]): ImageSetToSample[T] =
    new ImageSetToSample(inputKeys, targetKeys, sampleKey)
  4. Typo of class path in https://github.com/intel-analytics/analytics-zoo/blob/master/zoo/src/main/scala/com/intel/analytics/zoo/examples/streaming/objectdetection/README.md#better-performance-with-inference-model

    To enable this feature, simply replace --class com.intel.analytics.zoo.examples.streaming.StreamingObjectDetection.TextClassification \ with --class com.intel.analytics.zoo.examples.streaming.textclassification.StreamingInferenceObjectDetection \ in Step 1.
qiuxin2012 commented 5 years ago

In https://github.com/intel-analytics/analytics-zoo/tree/master/zoo/src/main/scala/com/intel/analytics/zoo/examples/streaming/textclassification

Datasets and pre-trained models

    Pre-trained model & word index: Save trained text classification model and word index in Text Classification.

No pre-trained models in Text Classification, I need to train a new model myself.

qiuxin2012 commented 5 years ago

In https://github.com/intel-analytics/analytics-zoo/tree/master/zoo/src/main/scala/com/intel/analytics/zoo/examples/streaming/textclassification

    TERMINAL 2: Start StreamingTextClassification

MASTER=...
embeddingPath=... // glove path. Local file system/HDFS/Amazon S3 are supported
model=... // model path. Local file system/HDFS/Amazon S3 are supported
indexPath=... // word index path. Local file system/HDFS/Amazon S3 are supported
${ANALYTICS_ZOO_HOME}/bin/spark-shell-with-zoo.sh \
    --master ${MASTER} \
    --driver-memory 2g \
    --executor-memory 5g \
    --class com.intel.analytics.zoo.examples.streaming.textclassification.TextClassification \
    --model ${model} --indexPath ${indexPath}
  1. com.intel.analytics.zoo.examples.streaming.textclassification.TextClassification not found, should be com.intel.analytics.zoo.examples.streaming.textclassification.StreamingTextClassification.
  2. embeddingPath=... // glove path. Local file system/HDFS/Amazon S3 are supported is useless.
  3. add --port
Better Performance with Inference Model

Inference Model is a thread-safe package in Analytics Zoo aiming to provide high level APIs to speed-up development.

To enable this feature, simply replace --class com.intel.analytics.zoo.examples.streaming.textclassification.TextClassification \ with --class com.intel.analytics.zoo.examples.streaming.textclassification.StreamingInferenceTextClassification \ in Step 2.

com.intel.analytics.zoo.examples.streaming.textclassification.TextClassification should be com.intel.analytics.zoo.examples.streaming.textclassification.StreamingTextClassification.

qiuxin2012 commented 5 years ago

Get result from StreamingTextClassification,

0.033276346
0.10306183
0.06050616
0.05932404
0.05837263
0.05016891
0.09914416
0.06442031
0.035995074
0.036540367
0.029530527
0.04040195
0.059150815
0.03524854
0.051602174
0.026166044
0.036532912
0.039056625
0.038727943
0.04277275
[com.intel.analytics.bigdl.tensor.DenseTensor of size 20]
0.04813395
0.051264394
0.054150704
0.051601455
0.050325967
0.05357264
0.05045379
0.050900944
0.050478492
0.05023378
0.046649188
0.04874656
0.052243967
0.045277126
0.050665606
0.046622712
0.048631713
0.051473033
0.049481325
0.049092576
[com.intel.analytics.bigdl.tensor.DenseTensor of size 20]
0.033276346
0.10306183
0.06050616
0.05932404
0.05837263
0.05016891
0.09914416
0.06442031
0.035995074
0.036540367
0.029530527
0.04040195
0.059150815
0.03524854
0.051602174
0.026166044
0.036532912
0.039056625
0.038727943
0.04277275
[com.intel.analytics.bigdl.tensor.DenseTensor of size 20]

Very confused...what do they mean?

qiuxin2012 commented 5 years ago

Get result from StreamingSampleTextClassification,

2019-05-21 15:53:06 INFO  BlockManager:54 - Found block input-0-1558425184600 locally
2019-05-21 15:53:06 INFO  InferenceSupportive$:45 - model predict for activity time elapsed [0 s, 5 ms].
2019-05-21 15:53:06 INFO  InferenceSupportive$:45 - model predict for activity time elapsed [0 s, 5 ms].
0.033276346
0.10306183
0.06050616
0.05932404
0.05837263
0.05016891
0.09914416
0.06442031
0.035995074
0.036540367
0.029530527
0.04040195
0.059150815
0.03524854
0.051602174
0.026166044
0.036532912
0.039056625
0.038727943
0.04277275
0.041413687
0.04843641
0.060645767
0.048481476
0.062567264
0.08714326
0.0685151
0.04891999
0.05269838
0.02815096
0.03820772
0.033223692
0.048962936
0.058189925
0.073198654
0.022876032
0.025489531
0.057833593
0.05284152
0.042204056
2019-05-21 15:53:06 INFO  Executor:54 - 1 block locks were not released by TID = 9:
[input-0-1558425183400]
2019-05-21 15:53:06 INFO  Executor:54 - 1 block locks were not released by TID = 10:
[input-0-1558425184600]

or

2019-05-21 15:53:09 INFO  BlockManager:54 - Found block input-0-1558425186000 locally
2019-05-21 15:53:09 INFO  InferenceSupportive$:45 - model predict for activity time elapsed [0 s, 4 ms].
0.026483743
0.056345176
0.057438545
0.041527513
0.04797267
0.055835813
0.082895175
0.07543767
0.0665393
0.028521677
0.029409204
0.034537796
0.0659566
0.05149896
0.07210249
0.016194232
0.04589744
0.051051278
0.05428142
0.040073294
2019-05-21 15:53:09 INFO  Executor:54 - 1 block locks were not released by TID = 11:
[input-0-1558425186000]
qiyuangong commented 5 years ago

Hi, @dding3 Warning message is related to https://github.com/intel-analytics/analytics-zoo/blob/master/zoo/src/main/scala/com/intel/analytics/zoo/feature/image/ImageSetToSample.scala#L66

Can we remove this warning?

dding3 commented 5 years ago

This warning is used to warn user their data doesn't contain label which is important during training, since it's warning message, I think it can be ignored?

qiyuangong commented 5 years ago

Fxied. Thanks, @qiuxin2012 and @dding3