uakbr / whispe2.0

Apache License 2.0
0 stars 0 forks source link

Transcribe audio #3

Closed uakbr closed 2 years ago

uakbr commented 2 years ago

URL

https://www.youtube.com/watch?v=Pe_cC8q7I_A

github-actions[bot] commented 2 years ago

Language: english

Transcription: Hello, everyone. Welcome to AWS Tutorials. In AWS Tutorials, we provide workshops and exercises to learn about AWS services. And these workshops and exercises are published to our website, aws-dozo.com. Today, I'm going to show one demo. And this demo is how to use Microsoft Power BI Desktop to create a report with AWS Data Lake. So let's try to understand some of the building blocks required for this scenario. So in order to implement this scenario, you need to have AWS Data Lake configured. And I have already set up a Data Lake in this case. But in case you are interested to know how to create a Data Lake inside AWS, I have already created a separate workshop for that. I am providing the URL of that workshop in the description box below. You can simply follow the instructions of that workshop to create a Data Lake of your own. Then you also need to have AWS Athena ODBC driver deployed on the machine where you want to create the report. Again, I am going to provide a URL for the page where you can download this driver. And then you also need to have Microsoft Power BI Desktop deployed on the machine where you want to create the report. So once you have these three things in place, actually you can very well go and do the same thing, but I'm going to demonstrate you the demo over here. So let's move on to the demonstration. So first of all, this is the page where you can go and download Amazon Athena ODBC driver. It is available for various operating systems with documentation and everything. I am providing a link to this URL in the description box below, so you can simply use that to download the driver based on what operating system you have. Assuming that Athena ODBC driver is already deployed, and actually you can verify that. So if I go and search for, for instance, ODBC, and if I go to the ODBC data source administrator, go to system DSN and say I want to add. If you see Simba Athena ODBC driver listed over here, that's a confirmation that you have got Athena ODBC driver deployed on your machine. OK, so let's keep it here for now. We'll use it later. Let's go and see how we have created our data lake. So here is our data lake where we have created a database and that database is called simple database. Yeah, it's simple. Then this database has got two tables in the data lake. One is called sales. Second is customer. And these tables are storing data in the CSV format into S3 locations. OK, so we have done that and each of these tables have access permission from the user account, which I'm using for this demo. So for instance, if I select the table and say, hey, show me the permissions on this table, you can see that this is my user account, which has got a select permission on this table. Similarly, I have got select permissions on the customer's table as well. Now, if this is not making any sense to you at all, then obviously probably have not worked on data lake earlier. So you might use my data lake workshop URL to create a data lake by your own. It's pretty straightforward. And yeah, simply follow the workshop I have created. So now we have got our Athena ODBC driver deployed on my machine. My data lake is ready. I have permissions on the table. Let's see how I can use this whole configuration to create a report inside Power BI Desktop. So first step is to go to the Athena ODBC data source administrator. And we want to create a system DSM. So click on Add. Then you see the Simba Athena ODBC driver here. Select that one. And it will ask for the data source name. And let's say it is Dozo data source. Keep it simple. OK. Region. This is Ireland. So I will say EU West 1. These are schema and workgroup for the Athena. Now, Athena uses an S3 location for the output queries as an output. And what I have done is that I have created an S3 bucket which I can use for spitting out the Athena output. So this ODBC-script bucket I have already created. I'm simply putting that configuration over here. So once I have done the names and output location and the region where my data lake is, next is to set up the authentication. And there are a couple of choices you have for authentication. I'm going to show you three of them primarily. And one, in fact, I will use. So first one is ADFS. This is very important. Many people will have a single sign-on set up with their Microsoft Active Directory. In that case, you want to go for ADFS choice. In this case, you provide your go for ADFS. You provide your username, password, and then your IDP host URL here. And then you can use your Windows authentication to access the data lake. Now at the data lake level, when you are using a single sign-on with the Active Directory user, then your Active Directory user assumes a role inside AWS. And the role which it assumes should have permission to the tables in the data lake. So like my user account I showed you, I have got a select permission on the table. Similarly, the role with the user you are going to use here, the user will assume a role in AWS. And that role in AWS should have permission to the tables. Select permission on the table if you want to select. So this I'm not going to use here because I don't have single sign-on configured. If you are really interested to understand how single sign-on works, please let me know. I'm happy to create another demo or workshop for that. The second choice which I want to discuss here is called instance profile. So suppose you are running your Power BI desktop out of an EC2 instance. And in that case, you want to use the role of the EC2 instance to access the data lake. Then you go for this choice. Again, the role which you have assigned to the EC2 instance should have the permission on the tables in the data lake. Then only you can access it. But sometimes people can go for this kind of choice. And the third choice I want to discuss and which I'm going to use for demo over here is called IAM credentials. So in case of IAM credentials, actually you provide your access key and security key to authenticate. So you must have seen brilliant is my username, which I have access to the data lake tables. What I did that I created security credentials for that user ID. And then that is what I'm going to use here to authenticate. Again, I'm doing it because I don't have single sign-on and I'm not using EC2 instance. I'm using my personal laptop to access it. But yeah, you can choose the method you want depending on what scenario you are using here. So let me configure the credentials here. So this is my access key for the user account, which I have access to data lake. And then this here goes my security key to the user account. And then I can say, fair enough, let's say OK. And it's always a good idea to test it. So see that everything is configured properly. So click on the test button and search is successful. That means from this machine using these credentials, I am able to access my IAM. I'm able to access AWS and I do have access to the I do have access to data lake. Sorry, not data lake, but I do have access to the AWS. We still have not gone and hit the data lake yet. So let's say OK here and then simply say OK as well. So now your DSN has been configured. It's a one-time job you have to do. Now we can go to the Power BI. So here goes my Power BI desktop. And in this case, I go and say I want to get my data. So let's say more. And click on others. I go for ODBC and I connect. Now it is showing me various options. If you see that my Dozo data source is now listed over here as a DSN. I simply select that and say OK. And it is asking me to do that. Give me a login and password. When you first time it will do that. Say, you know what? I don't need this because I'm using my DSN based authentication. So I can simply say use my default and connect. Now I have got my connection set up. You can see here it is showing me that I am using Dozo data source. And if I go further over here, I can see my sample database. I can see the tables and these are all coming from Data Lake. I am interested in, for instance, sales table at this point of time. So here onwards, it is pretty straightforward the kind of experience you get inside Power BI. So I simply select it here and say OK, let's load this data. And I might want to make my window a little bigger here. Yeah, let it load first. Then I can make my window a little bigger. OK, taking some time. OK, fair enough. So now I have got my data connected. You can see it showing my sales table and it's showing my fields from the sales table. Here onwards, it's pretty straightforward. All I want is, for instance, let me create a pie chart here. So I can drag and drop the pie chart here. I'm not really guys reporting guy here. I understand more Data Lake. But then I can say, OK, let me see, can I see my sales by product line? So here is my product line and here is my sales. Can I see the sales by product line? And if you can see here, I can see the sales by the product line. And if you want to add more filter, for instance, can I see based on, say, can I do based on status? Can I add a status filter here? Yeah, you can see. Can I look at on hold? Yeah, you can put filter on hold. It's nothing on hold looks like. But it is all, yeah, it's all shipped. Can I see that? Yeah, you can do that. So this was pretty much to show you how you can how you can use Power BI Desktop with Athena ODBC driver to connect to AWS Data Lake and create reports. And that's all for today, guys. If you like the video, please click on the like button. Please subscribe to our channel and please do visit our website, aws-dozo, where we provide free AWS workshops and tutorials to learn about the web services. Thank you very much. Have a nice day. Bye bye.

Translation: Hello, everyone. Welcome to AWS Tutorials. In AWS Tutorials, we provide workshops and exercises to learn about AWS services. And these workshops and exercises are published to our website, aws-dozo.com. Today, I'm going to show one demo. And this demo is how to use Microsoft Power BI Desktop to create a report with AWS Data Lake. So let's try to understand some of the building blocks required for this scenario. So in order to implement this scenario, you need to have AWS Data Lake configured. And I have already set up a Data Lake in this case. But in case you are interested to know how to create a Data Lake inside AWS, I have already created a separate workshop for that. I am providing the URL of that workshop in the description box below. You can simply follow the instructions of that workshop to create a Data Lake of your own. Then you also need to have AWS Athena ODBC driver deployed on the machine where you want to create the report. Again, I am going to provide a URL for the page where you can download this driver. And then you also need to have Microsoft Power BI Desktop deployed on the machine where you want to create the report. So once you have these three things in place, actually you can very well go and do the same thing, but I'm going to demonstrate you the demo over here. So let's move on to the demonstration. So first of all, this is the page where you can go and download Amazon Athena ODBC driver. It is available for various operating systems with documentation and everything. I am providing a link to this URL in the description box below, so you can simply use that to download the driver based on what operating system you have. Assuming that Athena ODBC driver is already deployed, and actually you can verify that. So if I go and search for, for instance, ODBC, and if I go to the ODBC data source administrator, go to system DSN and say I want to add. If you see Simba Athena ODBC driver listed over here, that's a confirmation that you have got Athena ODBC driver deployed on your machine. OK, so let's keep it here for now. We'll use it later. Let's go and see how we have created our data lake. So here is our data lake where we have created a database and that database is called simple database. Yeah, it's simple. Then this database has got two tables in the data lake. One is called sales. Second is customer. And these tables are storing data in the CSV format into S3 locations. OK, so we have done that and each of these tables have access permission from the user account, which I'm using for this demo. So for instance, if I select the table and say, hey, show me the permissions on this table, you can see that this is my user account, which has got a select permission on this table. Similarly, I have got select permissions on the customer's table as well. Now, if this is not making any sense to you at all, then obviously probably have not worked on data lake earlier. So you might use my data lake workshop URL to create a data lake by your own. It's pretty straightforward. And yeah, simply follow the workshop I have created. So now we have got our Athena ODBC driver deployed on my machine. My data lake is ready. I have permissions on the table. Let's see how I can use this whole configuration to create a report inside Power BI Desktop. So first step is to go to the Athena ODBC data source administrator. And we want to create a system DSM. So click on add. Then you see the Simba Athena ODBC driver here. Select that one and it will ask for the data source name. And let's say it is Dozo data source. Keep it simple. OK. Region. This is Ireland. So I'll say EU West 1. These are schema and workgroup for the Athena. Now, Athena uses an S3 location for the output queries as an output. And what I have done is that I have created an S3 bucket, which I can use for, which I can use for spitting out the Athena output. So this ODBC script bucket I have already created. I'm simply putting that configuration over here. So once I've done the names and output location and the region where my data lake is, next is to set up the authentication. And there are a couple of choices you have for authentication. I'm going to show you three of them primarily. And one, in fact, I will use. So first one is ADFS. This is very important that many people like they will have like a single sign on set up with their Microsoft Active Directory. In that case, you want to go for ADFS choice. In this case, you provide your go for ADFS. You provide your user name, password, and then your IDP host URL over here. And then you can use your Windows authentication to access the data lake. Now, at the data lake level, when you're using a single sign on with the Active Directory user, then your Active Directory user assumes a role inside AWS. And the role which it assumes should have permission to the tables in the data lake. So like my user account I showed you, I have got a select permission on the table. Similarly, the role with the user you are going to use here, that user will assume a role in AWS. And that role in AWS should have permission to the tables. Select permission on a table if you want to. So this I'm not going to use here because I don't have single sign on configured. If you're really interested to understand how single sign on works, please let me know. I'm happy to create another demo or workshop for that. The second choice which I want to discuss here is called instance profile. So suppose you are running your Power BI desktop out of an EC2 instance. And in that case, you want to use the role of the EC2 instance to access the data lake. Then you go for this choice. Again, the role which you have assigned to the EC2 instance should have the permission on the tables in the data lake. Then only you can access it. But sometimes, yeah, people can go for this kind of choice. And the third choice I want to discuss and which I'm going to use for demo over here is called IAM credentials. So in case of IAM credentials, actually you provide your access key and security key to authenticate. So you must have seen brilliant brilliant years is my user name, which I have access to the data lake tables. What I did that I created security credentials for that user ID. And then that is what I'm going to use here to authenticate. Again, I'm doing it because I don't have single sign on and I'm not using EC2 instance. I'm using my personal laptop to access it. But yeah, you can choose the method you want depending on what scenario you are using here. So let me configure the credentials here. So this is my access key for the user account, which I've used in which I have access to data lake. And then this here goes my security key to the user account. And then I can say, fair enough. Let's say OK. And it's always a good idea to test it. So see that everything is configured properly. So click on the test button and search is successful. That means from this machine using these credentials, I am able to access my I'm able to access AWS. And I do have access to the I do have access to data lake. So, yeah, I'm sorry, not data lake, but I do have access to the AWS. We still have not gone and hit the data lake. OK, so let's say OK here and then simply say OK as well. So now your DSL has been configured. It's a one time job you have to do. Now we can go to the Power BI. So here goes my Power BI Power BI desktop. And in this case, I go and say I want to get my data. So let's say more. And click on others. I go for ODBC and I connect. Now it is showing me various options. If you see that my Dozo data source is now listed over here as a DSL. So I simply select that and say OK. And it is asking me to do that. Give me login and password. When you first time it will do that, say, you know what, I don't need this because I'm using my DSL based authentication. So I can simply say use my default and connect. Now I have got my connection set up. You can see here it is showing me that I am using Dozo data source. And if I go further over here, I can see my sample database. I can see the tables and these are all coming from Data Lake. I am interested in, for instance, sales table at this point of time. So here onwards, it's a pretty straightforward kind of experience you get inside. Get inside. Get inside Power BI. So I simply select it here and say OK, let's load this data. And I might want to make my window a little bigger here. Yeah, let it load first. Then I can make my window a little bigger. Okay, taking some time. Okay, fair enough. So now I have got my data connected. You can see it showing my sales table and it's showing my fields from the sales table. Here onwards, it's pretty straightforward. All I want is, for instance, let me create a pie chart here. So I can drag and drop the pie chart here. I'm not really guys reporting guy here. I understand more data lake. But then I can say, okay, let me see. Can I see my sales by product line? So here is my product line and here is my sales. Can I see the sales by product line? And if you can see here, I can see the sales by the product line. And if you want to add more filter, for instance, can I see based on, say, can I do based on status? Can I add a status filter here? Yeah, you can see. Can I look at on hold? Yeah, you can put filter on hold. It's nothing on hold looks like. But it is all, yeah, it's all shipped. Can I see that? Yeah, you can do that. So this was pretty much to show you how you can use Power BI Desktop with Athena ODBC driver to connect to AWS data lake and create reports. And that's all for today, guys. If you like the video, please click on the like button. Please subscribe to our channel and please do visit our website, aws-dozo, where we provide free AWS workshops and tutorials to learn about the web services. Thank you very much. Have a nice day. Bye bye.