microsoftarchive / iot-journey

a guidance project for implementing an IoT solution on Azure
Other
89 stars 34 forks source link

Inconsistent results in cold storage event processors #247

Closed francischeung closed 9 years ago

francischeung commented 9 years ago

I'm seeing different data when comparing the Long-term Storage StreamAanalytics job versus our custom .NET event processor.

francischeung commented 9 years ago

Repro:

  1. Run ScenarioSimulator for fixed amount of time and then quit.
  2. Run “ToBlob” StreamAnalytics job.
  3. Run Cold Storage Event Processor.
  4. Use hive query to count results: total & by device.

Result: Results show different number of events when you compare the Stream Analytics job to the Cold Storage Event Processor generated blob files.

hveiras commented 9 years ago

I've run both processors obtaining the same result on both cases. Due to #314 I was not able to run the query using the powershell verification scripts. Instead I run the query using HDInsight Query Console.

Here are the queries I used:

DROP TABLE iotjourneyhivetable2 CREATE EXTERNAL TABLE IF NOT EXISTS iotjourneyhivetable2 (json string) LOCATION "wasb://blobs-processor@[YourStorageAccountName].blob.core.windows.net/fabrikam" SELECT get_json_object(iotjourneyhivetable2.json, '$.Payload.DeviceId'), count(*) FROM iotjourneyhivetable2 GROUP BY get_json_object(iotjourneyhivetable2.json, '$.Payload.DeviceId')

And

DROP TABLE iotjourneyhivetable1 CREATE EXTERNAL TABLE IF NOT EXISTS iotjourneyhivetable1 (json string) LOCATION "wasb://blobs-asa@[YourStorageAccountName].blob.core.windows.net/" SET fs.azure.io.read.tolerate.concurrent.append=true SELECT get_json_object(iotjourneyhivetable1.json, '$.DeviceId'), count(*) FROM $iotjourneyhivetable1 GROUP BY get_json_object(iotjourneyhivetable1.json, '$.DeviceId')

Note: I had to temporarily set the blobs blobls-asa and blobs-processor as public before running the queries through the query console.