Open gaylem opened 5 months ago
Hi @gaylem.
Please don't forget to add the proper labels to this issue. Currently, the labels for the following are missing:
NOTE: Please ignore this comment if you do not have 'write' access to this directory.
To add a label, take a look at Github's documentation here.
Also, don't forget to remove the "missing labels" afterwards. To remove a label, the process is similar to adding a label, but you select a currently added label to remove it.
After the proper labels are added, the merge team will review the issue and add a "Ready for Prioritization" label once it is ready for prioritization.
Additional Resources:
FYI, I'm going to chat with @roslynwythe about this soon to confirm if I did it correctly.
@gaylem Thank you for writing this issue and for proposing some promising solutions to enable us to exclude liquid and yaml code from the CodeQL scan.
@gaylem @ExperimentsInHonesty There is another factor we need to consider when selecting an approach for this ER; that is, we are planning to ask developers to run CodeQL locally using the "CodeQL for VS Code" extension. I have been holding off on writing an issue to configure and document use of "CodeQL for VS Code" extension, until we had a solution to the problem of excluding liquid and yaml code. Ideally in this ER we will select an approach that can be utilized in the extension as well as in the GitHub workflow. According to the documentation Setting up CodeQL in Visual Studio there are options for such customization, but the setup is a bit complex.
I would like to hear your thoughts on this idea:
Hey @roslynwythe, I'm good with removing the third option. I don't know much about CodeQL for VS Code but if it's agreed that we want that in here as well, I'll be happy to investigate and add it.
@gaylem thank you for putting so much work into this issue. It is written very clearly! Here are a few notes for you and @roslynwythe
I am open to being wrong about the need to break these into smaller issues, and if that is the case, please explain in a comment.
Hi @ExperimentsInHonesty, thank you for your feedback!
@roslynwythe and I met to discuss this issue today. Here's where we landed:
As for the epic (#6378), Roslyn wants to discuss it in a meeting to see what you think.
At any rate, this seems like this is a good chance to consolidate things better. I'm still getting a feel for how you organize things so I'm open to guidance and would love to discuss more in a meeting as well if there's time.
Thanks again! 😄
@roslynwythe
At this point, I think that https://github.com/hackforla/website/issues/5005 can be closed, do you agree?
I reworked https://github.com/hackforla/website/issues/5005 so it addresses CodeQL deployment in general and mention of specific issues only appears in https://github.com/hackforla/website/issues/5159. So I could use https://github.com/hackforla/website/issues/5005 to address all the other issues, and this one can be closed.
There seems like there is content in this issue now that does not exist in the other issues. So I think this just has to wait until we can have a three-way meeting with @gaylem you and me.
Hi @aidanwsimmons, thank you for taking up this issue! Hfla appreciates you :)
Do let fellow developers know about your:- i. Availability: (When are you available to work on the issue/answer questions other programmers might have about your issue?) ii. ETA: (When do you expect this issue to be completed?)
You're awesome!
P.S. - You may not take up another issue until this issue gets merged (or closed). Thanks again :)
Availability: weekday afternoons ETA: 5/9/24
@aidanwsimmons I noticed the closed PR. How are things going on this issue. Do you need some help...
Please provide update
p.s., I added a ready for product
label to this issue, so I can check back in with you.
Update- I have been unable to implement a query file for the codeql scanning process that can exclude YAML front matter and Liquid code blocks. Unassigning myself from the issue as I am unfortunately too busy this week to dedicate the proper time to this issue. Im attaching a list of other possible methods to resolve the issue(though i have not assessed each of their viability.
Selective Scanning: Instead of trying to scan the entire file as JavaScript, you can selectively scan only the JavaScript portions. You can achieve this by programmatically separating JavaScript code blocks from Liquid or YAML blocks.
Escaping: You can use escaping mechanisms to prevent the Liquid or YAML code from being interpreted as part of the JavaScript syntax. For example, you might wrap Liquid or YAML code blocks within comments or strings to ensure they are ignored by the JavaScript parser.
Preprocessing: Before scanning the files with CodeQL, you can preprocess them to temporarily remove or comment out the Liquid or YAML code. Once the scanning is complete, you can restore the removed sections.
Custom Parsing: Develop a custom parser that can handle both JavaScript and Liquid/YAML syntax within the same file. This approach might be more complex but allows for a more seamless integration of different code types.
Plugin/Extension: Check if there are any existing plugins or extensions for CodeQL or other static analysis tools that support mixed-language parsing or can be extended to support it.
@aidanwsimmons thank you for providing a list of possible solutions. I did explore the "Escaping" solution, and I saw that some tools such as ESLint do support excluding portions of code via specailly formatted JS comments, but I couldn't find anything in CodeQL documentation to indicate support for such a solution. Are you suggesting that in codeql-scan-job.yml
, we could define such a specially formatted comment? Regarding the "Selective Scanning" solution you listed, are you suggesting that we break up the javascript into modules and exclude the modules that contain YAML/liquid code from scanning in codeql-scan-job.yml
?
@roslynwythe if a programmatic solution seems elusive, could we just dismiss the alerts manually?
@roslynwythe if a programmatic solution seems elusive, could we just dismiss the alerts manually?
@gaylem It is possible to dismiss the alerts as false positives, however based on the error message "Could not process some files due to syntax errors", I'm concerned that CodeQL might stop scanning the file when it encounters non-JS code, and so we could be missing some test coverage. We've seen in some HTML files, that CodeQL will create an alert only on one instance of problematic markup and then seem to ignore a later instance of similar markup. But this is just my supposition, not the result of testing. We could create an issue for someone to test this out.
@roslynwythe I did a little testing out of curiosity:
Liquid Code:
I dismissed alert 63 (Liquid code): https://github.com/gaylem/hackforla-website/security/code-scanning/63
Then added more liquid code to the file: https://github.com/gaylem/hackforla-website/blob/codeql-test/assets/js/utility/vrms-events.js
I ran the scan, and it didn’t throw an alert for the new liquid code
Then I added an unused global variable and it threw this error and didn’t highlight the liquid code it caught before: https://github.com/gaylem/hackforla-website/security/code-scanning/65
YAML Front Matter:
I dismissed alert 13 (YAML front matter): https://github.com/gaylem/hackforla-website/security/code-scanning/13
Then I added an unused global variable to this file: https://github.com/gaylem/hackforla-website/edit/codeql-test/assets/js/hamburger-nav.js
I ran the codeql workflow a couple of times and it threw these alerts, but they weren’t for the variable I added. They were the same YAML front matter code.:
Conclusion: So it seems like Liquid code will be ignored and the file will still be scanned for other kinds of errors, but YAML front matter alerts will be recreated if the original alert is dismissed. In both cases, it does appear that after dismissal, the files are still being scanned.
It would probably be a good idea if one or two people tested this out, too.
Thank you @gaylem for running these tests. This is very useful! I'm not able to view the links into your repository (404 error), but based on your explanation I have a question on point 3 under YAML Front Matter": If CodeQL failed to throw alerts for the unused variable in hamburger-nav.js
, doesn't that indicate that it is not scanning the full code?
@roslynwythe Step three of the YAML Front Matter section did throw errors, but only on the front matter alert code that I had previously dismissed. So it did scan the file, though I'm not sure why it didn't pick up the variable. Maybe the scan stopped at the front-matter when it was detected? Here's what I added (test2):
---
title: hamburger nav file
---
const test2 = "test"
// Retrieve the SVG element
const svgElement = document.querySelector('#burgerImage svg');
@roslynwythe Here are images of all the alerts since you can't see them:
@gaylem I repeated the YAML test in my repo and my results were the same. After dismissing the initial CodeQL alert on the frontmatter in hamburger-nav.js
, I found:
hamburger-nav.js
and hamburger-nav.js
was listed under "Could not process some files due to syntax errors"so to me it sounds like CodeQL is not able to successfully scan code past the "error" that occurs with the frontmatter, and we shouldn't dismiss those alerts.
I did have a suggestion based on a ChatGPT conversation. ChatGPT suggests that when listing queries, a "filter" type query should be listed first, before the query that scans for security and quality issues. So I would suggest switching the order in Option 1:
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
queries: path/to/exclude-patterns.ql, security-and-quality
When specifying queries in a CodeQL workflow using the syntax `queries: [security-and-quality, ignoreYamlAndLiquid.ql]`, the order in which you list the queries does matter. Here's the reason: ### Execution Order - **Order of Queries**: CodeQL processes the queries in the order they are listed. This means that if you have a query that modifies or filters out certain results (like `ignoreYamlAndLiquid.ql`), it should generally be listed first. This ensures that subsequent queries (like those from `security-and-quality`) operate on the already filtered set of code. - **Chaining Effects**: If `ignoreYamlAndLiquid.ql` is designed to filter out certain lines of code before running standard security and quality checks, it should be executed before `security-and-quality`. This ensures the filtering takes place first, and the standard checks are performed only on the relevant code. ### Correct Order Given this, the correct order would be to list `ignoreYamlAndLiquid.ql` first, followed by `security-and-quality`: ```yaml queries: - ignoreYamlAndLiquid.ql - security-and-quality ``` ### Example Workflow Configuration Here’s an example of how you might specify this in a CodeQL workflow file (e.g., `.github/workflows/codeql-analysis.yml`): ```yaml name: "CodeQL" on: push: branches: [ main ] pull_request: # The branches below must be a subset of the branches above branches: [ main ] schedule: - cron: '0 0 * * 0' jobs: analyze: name: Analyze runs-on: ubuntu-latest steps: - name: Checkout repository uses: actions/checkout@v2 - name: Initialize CodeQL uses: github/codeql-action/init@v1 with: languages: javascript queries: | ./path/to/ignoreYamlAndLiquid.ql security-and-quality - name: Autobuild uses: github/codeql-action/autobuild@v1 - name: Perform CodeQL Analysis uses: github/codeql-action/analyze@v1 ``` In this configuration: 1. **Checkout repository**: Checks out the code repository. 2. **Initialize CodeQL**: Initializes the CodeQL analysis with the custom query `ignoreYamlAndLiquid.ql` followed by the standard `security-and-quality` queries. 3. **Autobuild**: Builds the project (necessary for CodeQL to understand the code structure). 4. **Perform CodeQL Analysis**: Runs the CodeQL analysis using the specified queries. By ensuring that `ignoreYamlAndLiquid.ql` runs before `security-and-quality`, you ensure that your custom filtering logic is applied before the standard security and quality checks are performed.
To summarize the activity so far on this issue:
Due to the recent refactoring of CodeQL.yml, references to "CodeQL.yml" in this issue have been changed to "codeql-scan-job.yml"
Hi @luisitocanlas, thank you for taking up this issue! Hfla appreciates you :)
Do let fellow developers know about your:- i. Availability: (When are you available to work on the issue/answer questions other programmers might have about your issue?) ii. ETA: (When do you expect this issue to be completed?)
You're awesome!
P.S. - You may not take up another issue until this issue gets merged (or closed). Thanks again :)
Updated Availability: 6/11 ~ 6/28 New ETA: EOD 6/28
Current state:
Blocker: Figuring out a way to confirm that CodeQL was able to scan the files listed above without scanning the YAML front-matter or Liquid code and without throwing the error.
What I've done so far for the blocker:
Progress update: Below are the solutions that I've tried and the errors that kept flagging. I would have to say that I did not find a solution and would be moving this issue back to the Prioritized Backlog
.
Solution # 1:
Solution # 2:
Error 1: When the first solution was implemented, this error keeps popping up. When the second solution was implemented this error didn't pop up.
Error 2: For both solutions, the alerts are persistent.
@ExperimentsInHonesty @t-will-gillis Here is a summary of our options:
Hi @duojet2ez, thank you for taking up this issue! Hfla appreciates you :)
Do let fellow developers know about your:- i. Availability: (When are you available to work on the issue/answer questions other programmers might have about your issue?) ii. ETA: (When do you expect this issue to be completed?)
You're awesome!
P.S. - You may not take up another issue until this issue gets merged (or closed). Thanks again :)
I look forward to tackling this issue! It seems to be a challenge but also presents a good learning opportunity.
This week 8/26 I have time Mon - Thurs and then will be away labor day weekend and back the following week. My eta on this is not known at this point. I plan on reviewing the previous work on this/looking at comments.. after that I may have a better estimate.
Hey @duojet2ez We talked about this issue in our Monday meeting... If you look at @roslynwythe 's note above, we would like to focus most of your efforts on the last one:
we can try to modify the workflow such that CodeQL analyzes code after the Jekyll build. This article discusses such a pattern https://some-natalie.dev/blog/codeql-container-builds/ but it involves using a container job, and I'm not sure if we are comfortable with those.
gotcha!
@duojet2ez
Please add update using the below template (even if you have a pull request). Afterwards, remove the 'To Update !' label and add the 'Status: Updated' label.
If you need help, be sure to either: 1) place your issue in the Questions/In Review
column of the Project Board and ask for help at your next meeting, 2) put a "Status: Help Wanted" label on your issue and pull request, or 3) put up a request for assistance on the #hfla-site channel. Please note that including your questions in the issue comments- along with screenshots, if applicable- will help us to help you. Here and here are examples of well-formed questions.
You are receiving this comment because your last comment was before Tuesday, September 3, 2024 at 12:04 AM PST.
I'll be at the meeting tomorrow to discuss this issue. I would like to edit the codeql-scan-job.yml file and test but not entirely sure how to do this
Overview
Many of our Javascript and HTML code files cannot be scanned by CodeQL as-is because they contain non-JS Liquid code
{% ... %}
or YAML front matter--- ... ---
, which cause syntax errors. We need to try and resolve these errors without removing all non-JS code.Details
The error message "Could not process some files due to syntax errors" indicates that these "syntax errors" may prevent CodeQL from scanning the files below (see issue #5234 for details).
hamburger-nav.js
: YAML front-matter with a titletoolkit.js
: 1 line of Liquid, empty YAML front-matterwins.js
: 2 lines (Liquid), empty YAML front-matterproject.js
: 2 lines (Liquid), empty YAML front-matterabout.js
: for loop (Liquid), empty YAML front-mattercurrent-project.js
: 2 lines + for loop (Liquid), empty YAML front-matterScreenshot: CodeQL error message
![CodeQL error message 1](https://github.com/hackforla/website/assets/106627338/a02cc3a2-6219-48d0-b375-984f5dd0a7ac)Simply deleting the Liquid lines would break the site (and CodeQL raised those errors accordingly in testing), so an alternative, holistic solution is required.
Action Items
Testing
.codeql-scan-job.yml
workflow.Resources/Instructions
Possible Solutions
Here are two possible solutions (in order of preference) to this problem. Please use your best judgment, these are only recommendations.
Option 1
This approach is preferred because it is
Define a new CodeQL query file that excludes Liquid and YAML patterns within JavaScript files.
Create a file named exclude-patterns.ql
Then modify codeql-scan-job.yml file to use the new query file for analysis. Update the queries section in the Initialize CodeQL step to include the new query file:
Option 2
Exclude liquid code and YAML front matter patterns from the CodeQL analysis within `codeql-scan-job.yml`
``` # On codeql-scan-job.yml file: - name: Perform CodeQL Analysis uses: github/codeql-action/analyze@v3 with: languages: javascript queries: security-and-quality # Exclude Liquid and YAML code within JavaScript files exclude: | path: "**/*.{js,html}" patterns: - pattern: | // Start of Liquid code {% if variable %} // Liquid code here {% endif %} // End of Liquid code - pattern: | // Start of YAML front matter --- # YAML front matter here --- // End of YAML front matter ```