Task 4. Export the network traffic to BigQuery to further analyze the logs
Create an export sink
On the Google Cloud console title bar, type Logs explorer in the Search field, then select Logs explorer from Search Results.
Under the RESOURCE TYPE, click Subnetwork. In the Query results pane, entries for all available subnetworks appear.
Under the LOG NAME filter, click compute.googleapis.com/vpc_flows. In the Query results pane, only the VPC flow log entries are shown.
Select Actions > Create Sink.
For the Sink name, type bq_vpcflows, and click NEXT.
In the Select sink service drop-down list, select BigQuery dataset.
In the Select BigQuery dataset drop-down list, select Create new BigQuery dataset.
For Dataset ID, enter bq_vpcflows, and click CREATE DATASET.
Click NEXT twice.
Click CREATE SINK.
Generate log traffic for BigQuery
Now that the network traffic logs are being exported to BigQuery, you need to generate more traffic by accessing the web-server several times. Using Cloud Shell, you can curl the IP address of the web-server several times.
In the Navigation menu, select Compute Engine > VM instances.
Note the External IP address for the web-server instance. It will be referred to as EXTERNAL_IP.
Click Activate Cloud Shell (Activate Cloud Shell icon).
If prompted, click Continue.
Store the EXTERNAL_IP in an environment variable in Cloud Shell:
export MY_SERVER=
Copied!
Access the web-server 50 times from Cloud Shell:
for ((i=1;i<=50;i++)); do curl $MY_SERVER; done
Copied!
Visualize the VPC flow logs in BigQuery
In the Cloud Console, in the Navigation menu, click BigQuery.
If prompted, click Done.
In the left pane, expand the bq_vpcflows dataset to reveal the table. You might have to first expand the Project ID to reveal the dataset.
Click on the name of the table. It should start with compute_googleapis.
Click on the Details tab.
Copy the Table ID value under Table info.
Note: If you do not see the bq_vpcflows dataset or if it does not expand, wait and refresh the page.
Click the + icon to open a new BidQuery Editor tab.
Add the following to the BigQuery Editor and replace your_table_id with TABLE_ID while retaining the accents (`) on both sides:
standardSQL
SELECT
jsonPayload.src_vpc.vpc_name,
SUM(CAST(jsonPayload.bytes_sent AS INT64)) AS bytes,
jsonPayload.src_vpc.subnetwork_name,
jsonPayload.connection.src_ip,
jsonPayload.connection.src_port,
jsonPayload.connection.dest_ip,
jsonPayload.connection.dest_port,
jsonPayload.connection.protocol
FROM
your_table_id
GROUP BY
jsonPayload.src_vpc.vpc_name,
jsonPayload.src_vpc.subnetwork_name,
jsonPayload.connection.src_ip,
jsonPayload.connection.src_port,
jsonPayload.connection.dest_ip,
jsonPayload.connection.dest_port,
jsonPayload.connection.protocol
ORDER BY
bytes DESC
LIMIT
15
Copied!
Click Run.
Note: If you get an error, ensure that you did not remove the #standardSQL part of the query. If it still fails, ensure that the TABLE_ID did not include the Project ID.
Which columns does the results table contain?
check
Sum of bytes sent
check
Source IP address and port
check
Subnet name
check
Destination IP address and port
check
VPC name
check
Protocol
Analyze the VPC flow logs in BigQuery
The previous query gave you the same information that you saw in the Cloud Console. Now, you will change the query to identify the top IP addresses that have exchanged traffic with your web-server.
Create a new query in the BigQuery Editor with the following and replace your_table_id with TABLE_ID while retaining the accents (`) on both sides:
standardSQL
SELECT
jsonPayload.connection.src_ip,
jsonPayload.connection.dest_ip,
SUM(CAST(jsonPayload.bytes_sent AS INT64)) AS bytes,
jsonPayload.connection.dest_port,
jsonPayload.connection.protocol
FROM
your_table_id
WHERE jsonPayload.reporter = 'DEST'
GROUP BY
jsonPayload.connection.src_ip,
jsonPayload.connection.dest_ip,
jsonPayload.connection.dest_port,
jsonPayload.connection.protocol
ORDER BY
bytes DESC
LIMIT
15
Copied!
Click Run.
Note: The results table now has a row for each source IP and is sorted by the highest amount of bytes sent to the web-server. The top result should reflect your Cloud Shell IP address.
Note: Unless you accessed the web-server after creating the export sink, you will not see your machine's IP address in the table.
You can generate more traffic to the web-server from multiple sources and query the table again to determine the bytes sent to the server.
Click Check my progress to verify the objective.
Please create an export sink for network traffic.
Export the network traffic to BigQuery to further analyze the logs
Please create an export sink for network traffic.
Task 5. Add VPC flow log aggregation
In this task, you will now explore a new release of VPC flow log volume reduction. Not every packet is captured into its own log record. However, even with sampling, log record captures can be quite large.
You can balance your traffic visibility and storage cost needs by adjusting specific aspects of logs collection, which you will explore in this section.
Setting up aggregation
In the Console, navigate to the Navigation menu (Navigation menu icon) and select VPC network > VPC networks.
Click vpc-net.
In the Subnets tab, click vpc-subnet:
VPC subnets in Subnets tab
Click Edit > Advanced Settings to expose the following fields:
Flow log settings additional fields
The purpose of each field is explained below.
Aggregation time interval: Sampled packets for a time interval are aggregated into a single log entry. This time interval can be 5 sec (default), 30 sec, 1 min, 5 min, 10 min, or 15 min.
Metadata annotations: By default, flow log entries are annotated with metadata information, such as the names of the source and destination VMs or the geographic region of external sources and destinations. This metadata annotation can be turned off to save storage space.
Log entry sampling: Before being written to the database, the number of logs can be sampled to reduce their number. By default, the log entry volume is scaled by 0.50 (50%), which means that half of entries are kept. You can set this from 1.0 (100%, all log entries are kept) to 0.0 (0%, no logs are kept).
Set the Aggregation Interval to 30 seconds.
Set the Secondary sampling rate to 25%.
Click Save. You should see the following message:
Estimated logs generated per day notification
Setting the aggregation level to 30 seconds can reduce your flow logs size by up to 83% compared to the default aggregation interval of 5 seconds. Configuring your flow log aggregation can seriously affect your traffic visibility and storage costs.
Added two new tasks
Task 4. Export the network traffic to BigQuery to further analyze the logs Create an export sink On the Google Cloud console title bar, type Logs explorer in the Search field, then select Logs explorer from Search Results. Under the RESOURCE TYPE, click Subnetwork. In the Query results pane, entries for all available subnetworks appear. Under the LOG NAME filter, click compute.googleapis.com/vpc_flows. In the Query results pane, only the VPC flow log entries are shown. Select Actions > Create Sink. For the Sink name, type bq_vpcflows, and click NEXT. In the Select sink service drop-down list, select BigQuery dataset. In the Select BigQuery dataset drop-down list, select Create new BigQuery dataset. For Dataset ID, enter bq_vpcflows, and click CREATE DATASET. Click NEXT twice. Click CREATE SINK. Generate log traffic for BigQuery Now that the network traffic logs are being exported to BigQuery, you need to generate more traffic by accessing the web-server several times. Using Cloud Shell, you can curl the IP address of the web-server several times.
In the Navigation menu, select Compute Engine > VM instances. Note the External IP address for the web-server instance. It will be referred to as EXTERNAL_IP. Click Activate Cloud Shell (Activate Cloud Shell icon). If prompted, click Continue. Store the EXTERNAL_IP in an environment variable in Cloud Shell: export MY_SERVER=
Copied!
Access the web-server 50 times from Cloud Shell:
for ((i=1;i<=50;i++)); do curl $MY_SERVER; done
Copied!
Visualize the VPC flow logs in BigQuery
In the Cloud Console, in the Navigation menu, click BigQuery.
If prompted, click Done.
In the left pane, expand the bq_vpcflows dataset to reveal the table. You might have to first expand the Project ID to reveal the dataset.
Click on the name of the table. It should start with compute_googleapis.
Click on the Details tab.
Copy the Table ID value under Table info.
Note: If you do not see the bq_vpcflows dataset or if it does not expand, wait and refresh the page.
Click the + icon to open a new BidQuery Editor tab.
Add the following to the BigQuery Editor and replace your_table_id with TABLE_ID while retaining the accents (`) on both sides:
standardSQL
SELECT jsonPayload.src_vpc.vpc_name, SUM(CAST(jsonPayload.bytes_sent AS INT64)) AS bytes, jsonPayload.src_vpc.subnetwork_name, jsonPayload.connection.src_ip, jsonPayload.connection.src_port, jsonPayload.connection.dest_ip, jsonPayload.connection.dest_port, jsonPayload.connection.protocol FROM
your_table_id
GROUP BY jsonPayload.src_vpc.vpc_name, jsonPayload.src_vpc.subnetwork_name, jsonPayload.connection.src_ip, jsonPayload.connection.src_port, jsonPayload.connection.dest_ip, jsonPayload.connection.dest_port, jsonPayload.connection.protocol ORDER BY bytes DESC LIMIT 15 Copied! Click Run. Note: If you get an error, ensure that you did not remove the #standardSQL part of the query. If it still fails, ensure that the TABLE_ID did not include the Project ID.Which columns does the results table contain? check Sum of bytes sent check Source IP address and port check Subnet name check Destination IP address and port check VPC name check Protocol
Analyze the VPC flow logs in BigQuery The previous query gave you the same information that you saw in the Cloud Console. Now, you will change the query to identify the top IP addresses that have exchanged traffic with your web-server.
Create a new query in the BigQuery Editor with the following and replace your_table_id with TABLE_ID while retaining the accents (`) on both sides:
standardSQL
SELECT jsonPayload.connection.src_ip, jsonPayload.connection.dest_ip, SUM(CAST(jsonPayload.bytes_sent AS INT64)) AS bytes, jsonPayload.connection.dest_port, jsonPayload.connection.protocol FROM
your_table_id
WHERE jsonPayload.reporter = 'DEST' GROUP BY jsonPayload.connection.src_ip, jsonPayload.connection.dest_ip, jsonPayload.connection.dest_port, jsonPayload.connection.protocol ORDER BY bytes DESC LIMIT 15 Copied! Click Run. Note: The results table now has a row for each source IP and is sorted by the highest amount of bytes sent to the web-server. The top result should reflect your Cloud Shell IP address. Note: Unless you accessed the web-server after creating the export sink, you will not see your machine's IP address in the table. You can generate more traffic to the web-server from multiple sources and query the table again to determine the bytes sent to the server.Click Check my progress to verify the objective. Please create an export sink for network traffic. Export the network traffic to BigQuery to further analyze the logs
Please create an export sink for network traffic.
Task 5. Add VPC flow log aggregation In this task, you will now explore a new release of VPC flow log volume reduction. Not every packet is captured into its own log record. However, even with sampling, log record captures can be quite large.
You can balance your traffic visibility and storage cost needs by adjusting specific aspects of logs collection, which you will explore in this section.
Setting up aggregation In the Console, navigate to the Navigation menu (Navigation menu icon) and select VPC network > VPC networks.
Click vpc-net.
In the Subnets tab, click vpc-subnet:
VPC subnets in Subnets tab
Click Edit > Advanced Settings to expose the following fields: Flow log settings additional fields
The purpose of each field is explained below.
Aggregation time interval: Sampled packets for a time interval are aggregated into a single log entry. This time interval can be 5 sec (default), 30 sec, 1 min, 5 min, 10 min, or 15 min.
Metadata annotations: By default, flow log entries are annotated with metadata information, such as the names of the source and destination VMs or the geographic region of external sources and destinations. This metadata annotation can be turned off to save storage space.
Log entry sampling: Before being written to the database, the number of logs can be sampled to reduce their number. By default, the log entry volume is scaled by 0.50 (50%), which means that half of entries are kept. You can set this from 1.0 (100%, all log entries are kept) to 0.0 (0%, no logs are kept).
Set the Aggregation Interval to 30 seconds.
Set the Secondary sampling rate to 25%.
Click Save. You should see the following message:
Estimated logs generated per day notification
Setting the aggregation level to 30 seconds can reduce your flow logs size by up to 83% compared to the default aggregation interval of 5 seconds. Configuring your flow log aggregation can seriously affect your traffic visibility and storage costs.