jeff1evesque / ist-687

Syracuse IST687 final project with Jesse Warren (team member)
2 stars 0 forks source link

Create basic Language vs page views visualization(s) #29

Closed jeff1evesque closed 6 years ago

jeff1evesque commented 6 years ago

We will create visualization(s) between our Language column, and the page views, within our basic.R.

jeff1evesque commented 6 years ago

We need to remove any rows with www values for the Language column, in our df dataframe. This logic can be incorporated from our munge_ist687.R, and will remove the following entries:

                                                                              Article Language 2015.07
19615                                                                             API      www   15471
19616                                                            API:Account_creation      www    1377
19617                                                                   API:Allimages      www     744
19618                                                                    API:Allpages      www     831
19619                                                                   API:Backlinks      www     563
19622                                                                API:Data_formats      www    2852
19623                                                                   API:Etiquette      www    1081
19624                                                                         API:FAQ      www    2223
19625                                                                       API:Login      www    2287
19626                                                                      API:Logout      www     772
19627                                                                API:Main_page/de      www    1163
19628                                                                API:Main_page/ja      www     926
19629                                                                API:Main_page/zh      www     663
19630                                                                        API:Meta      www    1509
19631                                                                        API:Move      www    1043
19632                                                       API:Parameter_information      www    1051
19633                                                                       API:Query      www    7880
19634                                                           API:Quick_start_guide      www     299
19635                                                                      API:Random      www     498
19636                                                                   API:Revisions      www     582
19637                                                                    API:Rollback      www     726
19638                                                                      API:Search      www    1767
19639                                                        API:Search_and_discovery      www    1315
19640                                                                      API:Tokens      www     869
19641                                                                    API:Tutorial      www    3782
19642                                                                       API:Users      www     481
19644                                                             Alternative_parsers      www    3650
19645                                                                       Analytics      www    1309
19647                                                            Apache_configuration      www    1457
19649                                                        Beta_Features/Hovercards      www    1127
19650                                                                  Bug_management      www    2498
19651                                                         Category:All_extensions      www    3887
19653                                                             Category:Extensions      www   15775
19654                                                          Category:Extensions/de      www    1099
19656                                                 Category:Extensions_by_category      www    2482
19658                                                      Category:Hidden_categories      www     208
19659                                                           Category:Installation      www     609
19660                                                 Category:MediaWiki_Introduction      www     737
19662                                               Category:Skins_based_on_Bootstrap      www     410
19663                                                Category:Special_page_extensions      www     613
19664                                                      Category:Stable_extensions      www     884
19665                                                     Category:WYSIWYG_extensions      www     915
19666                                                                          Citoid      www    1518
19667                                       Comparison_of_extensions_in_distributions      www     705
19668                                                          Continuous_integration      www    2482

Note: additional columns exists for the above, and was omitted since the corresponding time series data, is not valuable for the requirement of removing these rows.

jeff1evesque commented 6 years ago

Based on the following snippet, we'll remove all rows with a value commons, for the Language column:

Wikimedia Commons (or simply Commons) is an online repository of free-use images, sounds, and other media files.[1] It is a project of the Wikimedia Foundation.