solliancenet / MCW-Azure-Synapse-Analytics

MIT License
3 stars 7 forks source link

Draft workshop reviews #1

Open DawnmarieDesJardins opened 4 years ago

DawnmarieDesJardins commented 4 years ago

Please leave review comments for our authors by adding to the conversation below.

Pedro-Martinez commented 4 years ago

Things I would like the audience to understand differentiators against our competitors:

  1. Express route.
  2. Native Azure Active Directory. How easy it is to add AAD to Synapse. Competitors need to configure it and we have it native
  3. Compliance/certifications. I understand this is not a Healthcare business case, but we have certifications our competitors don't have like HITRUST.
  4. SQL Audits, read logs, our rich set of tools
DataSnowman commented 4 years ago

I am not sure the ONNX scenario is very real. Open Neural Network Exchange is more for Deep Learning and exchange say between TensorFlow running on GPUs and CoreML on an iPhone. I would use Azure Databricks or AML with GPU clusters to demonstrate Deep Learning. Don't see Synapse here. How about some standard ML algorithm (Regression or a Classification) and deploy the python model to run in the ML Services in SQL DW. No exchange needed since all Python. Even better add this scenario and use AML to connect to ADLS datastore that has files manipulated by Synapse. Create an AML dataset. Run Automated ML on the dataset and deploy the model to AKS. Call that model using AI Insights in a Power BI workflow to score some data. No customers I know of have flocked to ML models running in SQL Server of any SKU.

DataSnowman commented 4 years ago

For this part below expected folder structures to partition the data by date or on some other basis. Not just the standard delta lake bronze, silver, gold.

What storage service would you recommend they use and how would you recommend they structure the folders so they can manage the data at the various levels of refinement?

They should use Azure Data Lake Store (ADLS) gen 2 (Azure Storage with hierarchical file systems). In ADLS, it is a best practice to have a dedicated Storage Account for production, and a separate Storage Account for dev and test workloads. This will ensure that dev or test workloads never interfere with production. One common folder structure is to organize the data in separate folders by degree of refinement. For example a bronze folder contains the raw data, silver contains the cleaned, prepared and integrated data and gold contains data ready to support analytics, which might include final refinements such as pre-computed aggregates.

DataSnowman commented 4 years ago

Is there going to be code provided in these workshops as well? Stuff that makes the diagrams in the markdown work. Perhaps this is the Hands-on Lab section