This pull request has one change in streamDMJob.scala:
Change:
batchInterval is set by parameter from command line (arguments of spark program)
This will process the list of arguments as follow:
++ Check the first argument (args(0)) if it is an integer. If yes, check if it is >0 and < maxInt. If yes, set it as the batchInterval. Then drop the first element of args
++ Pass the newArgs (args dropping the first element) to process in Task.
batchInterval is now set by Milliseconds (before it was Seconds).
Default value of batchInterval is 1000 ms.
Reason: User can set batchInterval from arguments. Because batchInterval needs to be tuned a lot to help SparkStreaming program have a good performance, it should not be hardcoded.
Test: To verify this, you could run the following test:
This pull request has one change in streamDMJob.scala:
Change:
Default value of batchInterval is 1000 ms.
Reason: User can set batchInterval from arguments. Because batchInterval needs to be tuned a lot to help SparkStreaming program have a good performance, it should not be hardcoded.
Test: To verify this, you could run the following test:
./spark.sh "200 EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/electNormNew.arff -k 4000 -d 10)" 1> result.res 2> log.log
This sets batchInterval to be 200 ms
./spark.sh "-300 EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/electNormNew.arff -k 4000 -d 10)" 1> result.res 2> log.log
This sets batchInterval to be default value = 1000 ms
./spark.sh "EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/electNormNew.arff -k 4000 -d 10)" 1> result.res 2> log.log
This sets batchInterval to be default value = 1000 ms