rushikeshnakhate / AI-Module2

0 stars 0 forks source link

Complete Week1 #1

Open rushikeshnakhate opened 4 years ago

rushikeshnakhate commented 4 years ago
rushikeshnakhate commented 4 years ago

Graphs from Data Rather than constructing a graph from a dictionary, you may be using a dataset in the form of a data file (which is more useful for a graph of a reasonable size that would arise in any application). Here, you can experiment with an example dataset from the https://snap.stanford.edu/data/index.html#socnets

Here is the description of the data from the source:

This dataset consists of 'circles' (or 'friends lists') from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.

Facebook data has been anonymized by replacing the Facebook-internal ids for each user with a new value. Also, while feature vectors from this dataset have been provided, the interpretation of those features has been obscured. For instance, where the original dataset may have contained a feature "political=Democratic Party", the new data would simply contain "political=anonymized feature 1". Thus, using the anonymized data it is possible to determine whether two users have the same political affiliations, but not what their individual political affiliations represent.

We easily read in the data using the Pandas function read_csv to produce a Pandas DataFrame. The rows of the DataFrame facebook created below correspond to edges in a graph (i.e., an edge connecting the node from column A to the node from column B). The DataFrame facebook is then in a suitable form to pass to the networkx function from_pandas_edgelist to create a Graph object.

rushikeshnakhate commented 4 years ago

A Graph to represent airport connections Below, you will use a dataset containing information about airline flights between airports. In the next two questions, you will import the data into a networkx Graph object and identify paths in the graph. To begin, you need to examine the structure of the data.

This is what the data file air_routes.csv looks like:

,source,dest,count 0,AAE,ALG,1 1,AAE,CDG,1 ⋮, ⋮ , ⋮ ,⋮⋮, ⋮ , ⋮ ,⋮ 37593,ZYI,XMN,1 37594,ZYL,DAC,4 There are slightly more than 37,500 rows with the row number, departure (source) airport, arrival (dest) airport, and number of flights (count) as columns. The airports are represented using three-letter IATA codes.

Run the cell below to load the data into a Pandas DataFrame and display the first few rows.

rushikeshnakhate commented 4 years ago

Finding Paths in a Graph Once you have a graph representation of connected airports, you can use built-in algorithms in networkx to determine paths from one airport to another. For example, the airports located in Albany, New York and San Francisco, California are coded as ALB and SFO respectively. You can determine the shortest paths between the two airports using the nx.shortest_paths.all_shortest_paths function.

Your task here is to obtain all the shortest paths from ALB to SFO (there are more than one).

Use all_shortest_paths from the submodule networkx.shortest_paths to compute all the paths required. Assign the result to ans_6a after converting it to a list (the result returned by all_shortest_paths is a Python generator to permit lazy evaluation). Assign the length of the shortest path to the identifier ans_6b. Notice that each path is a list of nodes including the initial and terminal node. The path length corresponds to the number of edges in the path. Assign the number of shortest paths to the identifier ans_6c. Notice that this graph model does not include geographical distances between airports. As such, the use of the adjective "shortest" is somewhat counter-intuitive (especially if you have any sense of the relative geographical locations of these airports). In this example, "the shortest path" refers to "the path involving the fewest connecting flights" (i.e., edges) irrespective of geographical distance. If precise coordinates for each airport had been included with the input data, you could have used the geographical distance as an edge attribute for each edge (in which case, the "shortest path" can be computed in the more conventional sense).