Open jconn0 opened 6 months ago
APICAD currently uses the pretrained model "UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_XLMR_BASE" which supports both semantic role labeling and semantic dependency parsing.
Semantic role labeling is better at identifying actions such as conditional statements and causal relations. This is more fitting for our use case of trying to identify an API's actions and outputs to detect errors.
The model supports two different types of SRL: beginning, inside, outside and ranking.
BIO SRL is better at extracting role-containing phrases from structured texts with clearly distinguished roles.
Rank SRL is better at handling ambiguity since it considers the broader context within a sentence.
Assuming we are using well written API documentation that is structured clearly BIO SRL should be more effective.
Attempting to build the model on the provided file for Curl produces errors indicating issues with the bitcode preventing the evaluation from functioning correctly. The evaluation runs fine for the project case files which are provided as built.
We'll use the basic example containing the pre-built bitcode files to get around this. In order to reanalyze the documents with the changes to the HanLP model you can run (specifying the target prevents it from running on all three projects): apicad doc-analyze --target glibc After this, to perform the analysis run: apicad analyze --target SSL_get_ex_data EVP_MD_CTX_new To run the detection with document specifications run: apicad detect --enable-doc
My computer was unable to handle running the document analysis. The process would just output "killed" to the terminal and having another terminal open with top confirmed it was running out of resources.
I've reinstalled everything on a GitHub Codespace that is able to run everything.
Add semantic parsing to provide more context to the document parsing.
Could be done using HanLP, the existing NLP library, or by converting the project to a different library.