hope-data-science / akc

Automatic knowledge classification based on keyword co-occurrrence network
https://hope-data-science.github.io/akc/
Other
15 stars 4 forks source link

Suggestions from Richard #2

Open hope-data-science opened 4 years ago

hope-data-science commented 4 years ago

My recommendations are provided as follows:

  1. I am not sure what the utility of the "dict" argument is in the keyword_extract function. It is believed that this argument obscures more than illuminates an understanding of the function itself. Since the akc package is designed to handle either a corpus with or without previously encoded keywords, it would seem that a better argument to have in place of the "dict" argument is to replace it with one that addresses stopwords. It is believed that an argument such as "swList" or something similar could be used to remove the "filler" words commonly associated with the English language such as prepositions and articles (a, and, the, etc). A general stopword list can be found in the tm R package so perhaps a wrapper to that function could be used in the development of this argument. Without the ability to remove stopwords made available within the function itself, this task becomes a separate task with which to accomplish for those activities where keywords are not previously encoded.
  2. Without possessing a deeper knowledge of R, datasets created within the keyword_group function are virtually impossible to separately break apart and review. It is recommended that an additional function be developed to compliment the akc package that would convert both the node and the edge lists generated by the keyword_group into a separate list object. Possible names for this argument could be getkwGrpData or something similarly defined.
  3. It would be nice if the co-occurrence network visualization contained a small text label that showed the total number of word occurrences contained within each group. A text label that read "TO = x" or "Total Word Occurrences = x" where x defines the number of keyword occurrences per group would provide additional insight into each co-occurrence network group.
  4. It is recommended to consider adding an argument to the keyword_group function that would allow the user to determine which section of the keyword frequency distribution to use to generate the table graph object. While the highest frequency distribution for a keyword list may be the most commonly selected data representation for a plot, it may also be valuable to know which keywords are the least used or have a low frequency of occurrence. What about a keyword list where the frequency is neither the highest nor the lowest but somewhere in the middle of the frequency range? An argument added to the keyword_group that represents this capability would give the user much more control over the type of keyword frequency distribution to plot and present. An argument with options such as freq = "h", freq = "m", or freq = "l" would provide additional value to the akc package with the default being a high frequency occurrence model.
  5. While the co-occurrence network plot is beautifully designed, it is only a single plot. If a Word Cloud plot could be introduced into the package with the same kind of simplicity of deployment as the co-occurrence network plot, it would add value to the akc package.
hope-data-science commented 4 years ago
  1. It's useful to add the "stopword" parameter. I've added already, it will be included in the next version. About the dict parameter, it could be set to NULL and run. I've designed it becasue we need to cope with huge corpus and extract relevant information. We usually use a professional dictionary, and make sure the extracted contents are in the dictionary. While the stop words could be filtered out to be excluded, dictionary could make sure the words outside dictionary to be filtered out.
hope-data-science commented 4 years ago
  1. You can get the data by simply using as.data.frame or as_tibble to the output object. I'll make anothre tutorial to better explain that.
hope-data-science commented 4 years ago
  1. Add keyword_cloud, will be included in next version.
hope-data-science commented 4 years ago

3&4. Use keyword_network to provide flexible network visualization, will be included in next release.