sailuh / kaiaulu

An R package for mining software repositories
http://itm0.shidler.hawaii.edu/kaiaulu
Mozilla Public License 2.0
20 stars 13 forks source link

i #206 Adds Text Module #207

Closed carlosparadis closed 1 year ago

carlosparadis commented 1 year ago

This pull request is for ongoing development and requirement refinement of this module. Once the idea and usage mature a bit more, it will be merged to master.

codecov[bot] commented 1 year ago

Codecov Report

Merging #207 (b55b871) into master (a241585) will decrease coverage by 0.27%. The diff coverage is 0.00%.

:exclamation: Current head b55b871 differs from pull request most recent head 30b54df. Consider uploading reports for the commit 30b54df to get more accurate results

@@            Coverage Diff            @@
##           master    #207      +/-   ##
=========================================
- Coverage    9.13%   8.86%   -0.27%     
=========================================
  Files          16      17       +1     
  Lines        2179    2245      +66     
=========================================
  Hits          199     199              
- Misses       1980    2046      +66     
Impacted Files Coverage Δ
R/parser.R 8.67% <0.00%> (-0.39%) :arrow_down:
R/text.R 0.00% <0.00%> (ø)
lh-zhan commented 1 year ago

A few thing I noticed while attempting to run vignettes/src_text_showcase.Rmd

  1. Seems like the default depends.yml used in this notebook is not included in this repo.
  2. I incorporated the srcml section into thrift.yml to test things out, then ran into a No Module Found issue after installing Spiral. The issue was due to having both virtual env and local Python and R Studio confused one with another. I resolved the issue by adding use_python("/opt/homebrew/bin/python3") before importing spiral.
  3. I ran into Error: AttributeError: module 'collections' has no attribute 'Iterable' on line https://github.com/sailuh/kaiaulu/blob/0452a48a866ad28d134500b4a7687db14bacdef5/vignettes/src_text_showcase.Rmd#L135 I searched up the error and it seems like a Python Version related issue, but I had no luck of resolving it in R Studio, have you ran into similar issue before?

Thanks!

carlosparadis commented 1 year ago

Oops. Here's the depends.yml, I will version it later to this branch. Had to paste it below since GitHub won't allow me to attach a .yml.

I am finding a bit strange you ran into the use_python issue. Did you set RStudio to the right version of Python using:

https://github.com/sailuh/kaiaulu/blob/0452a48a866ad28d134500b4a7687db14bacdef5/vignettes/src_text_showcase.Rmd#L23

And it still complained? What Python version are you using? I took a quick look and it seems the error may be caused because one of these libraries is using a deprecated version.

Screen Shot 2023-05-08 at 11 18 14 AM

Could you try this older Python version and try setting via RStudio the path to it?

Lastly, when you have a chance have a look at this issue: https://github.com/sailuh/ArchMech/issues/2 for a list of projects that are known to use GoF. You will likely want to create conf files for those projects to test the keywords since they are known to have them and there is supplemental material specifying they are using other approach. That way, if you can't detect them, we know the keywords may be lacking, and/or you can inspect the other method detected patterns to see what other keywords would make sense.

# -*- yaml -*-
# https://github.com/sailuh/kaiaulu
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
# notice and this notice are preserved.  This file is offered as-is,
# without any warranty.

# Project Configuration File #
#
# To perform analysis on open source projects, you need to manually
# collect some information from the project's website. As there is
# no standardized website format, this file serves to distill
# important data source information so it can be reused by others
# and understood by Kaiaulu.
#
# Please check https://github.com/sailuh/kaiaulu/tree/master/conf to
# see if a project configuration file already exists. Otherwise, we
# would appreciate if you share your curated file with us by sending a
# Pull Request: https://github.com/sailuh/kaiaulu/pulls
#
# Note, you do NOT need to specify this entire file to conduct analysis.
# Each R Notebook uses a different portion of this file. To know what
# information is used, see the project configuration file section at
# the start of each R Notebook.
#
# Please comment unused parameters instead of deleting them for clarity.
# If you have questions, please open a discussion:
# https://github.com/sailuh/kaiaulu/discussions

project:
  website: https://github.com/multilang-depends/depends
  #openhub: https://www.openhub.net/p/apache_portable_runtime

version_control:
  # Where is the git log located locally?
  # This is the path to the .git of the project repository you are analyzing.
  # The .git is hidden, so you can see it using `ls -a`
  log: ../../rawdata/git_repo/depends/.git
  # From where the git log was downloaded?
  log_url: https://github.com/multilang-depends/depends
  # List of branches used for analysis
  branch:
    - master

#mailing_list:
  # Where is the mbox located locally?
#  mbox: ../../rawdata/mbox/apr-dev_2012_2019.mbox
  # What is the domain of the chosen mailing list archive?
#  domain: http://mail-archives.apache.org/mod_mbox
  # Which lists of the domain will be used?
#  list_key:
#    - apr-dev

#issue_tracker:
#  jira:
    # Obtained from the project's JIRA URL
#    domain: https://issues.apache.org/jira
    #project_key: HELIX
    # Download using `download_jira_data.Rmd`
    #issues: ../../rawdata/issue_tracker/helix_issues.json
    #issue_comments: ../../rawdata/issue_tracker/helix_issue_comments.json
#  github:
    # Obtained from the project's GitHub URL
#    owner: apache
#    repo: apr
    # Download using `download_github_comments.Rmd`
#    replies: ../../rawdata/github/apr/

#vulnerabilities:
  # Folder path with nvd cve feeds (e.g. nvdcve-1.1-2018.json)
  # Download at: https://nvd.nist.gov/vuln/data-feeds
  #nvd_feed: rawdata/nvdfeed

# Commit message CVE or Issue Regular Expression (regex)
# See project's commit message for examples to create the regex
#commit_message_id_regex:
#  issue_id: \#[0-9]+
  #cve_id: ?

filter:
  keep_filepaths_ending_with:
    - cpp
    - c
    - h
    - java
    - js
    - py
    - cc
  remove_filepaths_containing:
    - test
    - java_code_examples

# Third Party Tools Configuration #
#
# See Kaiaulu's README.md for details on how to setup these tools.
tool:
  # Depends allow to parse file-file static dependencies.
  depends:
    # accepts one language at a time: cpp, java, ruby, python, pom
    # You can obtain this information on OpenHub or the project GiHub page right pane.
    code_language: java
    # Specify which types of Dependencies to keep - see the Depends tool README.md for details.
    keep_dependencies_type:
      - Cast
      - Call
      - Import
      - Return
      - Set
      - Use
      - Implement
      - ImplLink
      - Extend
      - Create
      - Throw
      - Parameter
      - Contain
  dv8:
    # The project folder path to store various intermediate
    # files for DV8 Analysis
    # The folder name will be used in the file names.
    folder_path: ../../analysis/dv8/depends
    # the architectural flaws thresholds that should be used
    architectural_flaws:
      cliqueDepends:
        - call
        - use
      crossingCochange: 2
      crossingFanIn: 4
      crossingFanOut: 4
      mvCochange: 2
      uiCochange: 2
      uihDepends:
        - call
        - use
      uihInheritance:
        - extend
        - implement
        - public
        - private
        - virtual
      uiHistoryImpact: 10
      uiStructImpact: 0.01
  # Uctags allows finer file-file dependency parsing (e.g. functions, classes, structs)
  uctags:
    # See https://github.com/sailuh/kaiaulu/wiki/Universal-Ctags for details
    # What types of file-file dependencies should be considered? If all
    # dependencies are specified, Kaiaulu will use all of them if available.
    keep_lines_type:
      c:
        - f # function definition
      cpp:
        - c # classes
        - f # function definition
      java:
        - c # classes
        - m # methods
      python:
        - c # classes
        - f # functions
      r:
        - f # functions
  # srcML allow to parse src code as text (e.g. identifiers)
  srcml:
    # The file path to where you wish to store the srcml output of the project
    srcml_path: ../../analysis/depends/srcml_depends.xml
    # Specify which types of Dependencies to keep - see the Depends tool README.md for details.
# Analysis Configuration #
analysis:
  # A list of topic and keywords (see src_text_showcase.Rmd).
  topics:
    topic_1:
      - Parser
      - Lexer
    topic_2:
      - Python
      - Ruby
  # You can specify the intervals in 2 ways: window, or enumeration
#  window:
    # If using gitlog, use start_commit and end_commit. Timestamp is inferred from gitlog
#    start_commit: 9eae9e96f15e1f216162810cef4271a439a74223
#    end_commit: f8f9ec1f249dd552065aa37c983bed4d4d869bb0
    # Use datetime only if no gitlog is used in the analysis.
    #start_datetime: 2013-05-01 00:00:00
    #end_datetime: 2013-11-01 00:00:00
#    size_days: 90
#  enumeration:
     # If using gitlog, specify the commits
#    commit:
#      - 9eae9e96f15e1f216162810cef4271a439a74223
#      - f1d2d568776b3708dd6a3077376e2331f9268b04
#      - c33a2ce74c84f0d435bfa2dd8953d132ebf7a77a
     # Use datetime only if no gitlog is used in the analysis. Timestamp is inferred from gitlog
#    datetime:
#      - 2013-05-01 00:00:00
#      - 2013-08-01 00:00:00
#      - 2013-11-01 00:00:00
lh-zhan commented 1 year ago

Thanks for looking into the issue. I checked the Python interpreter in R and it's the same 3.11.3 as yours. But the error persist. Then, as you suggested, I installed an older version of Python in my virtualenv and that eventually solved the problem :)

I'll proceed and run the rest of the notebook now. Will let you know if something else comes up.

And thank you for the depends.yml