hackforla / website

Hack for LA's website
https://www.hackforla.org
GNU General Public License v2.0
325 stars 778 forks source link

ER: Potential XSS Vulnerability in wins.js #5654

Open jaasonw opened 1 year ago

jaasonw commented 1 year ago

Emergent Requirement - Problem

I originally submitted this as a security advisory, but @roslynwythe said there isn't a way to convert it to an issue so I'm reposting it here

Original text:

Summary

The usage of innerHTML to dynamically modify DOM elements in wins.js may introduce a vulnerability to Cross-Site Scripting (XSS) attacks if wins-data.json is compromised, not properly sanitized, or not adequately vetted.

Details

The vulnerability exists on the following lines of code where DOM elements are dynamically modified using external data.

https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L339 https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L340 https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L498 https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L501 https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L504

To address this vulnerability, it is recommended to avoid using innerHTML and prefer safer DOM manipulation methods like createElement and textContent

Additionally, the unformatted nature of wins-data.json can make it difficult to spot changes in the file with a git diff. Consider reformatting the JSON file to be more human-readable and version control-friendly.

PoC

The vulnerability audit states that "User input strings remain strings and escape injection through decodeURIComponent()" however, in my testing this is not the case and strings parsed through decodeURIComponent() are still susceptible to XSS. Proof of concept: https://codepen.io/jaasonw/pen/GRPBzGJ

Impact

Potential defacement of the website or visitors being subject to phishing attacks.

Issue you discovered this emergent requirement in

Date discovered

10/1/2023

Did you have to do something temporarily

Who was involved

@

What happens if this is not addressed

Potential defacement of the website or visitors being subject to phishing attacks.

Resources

https://developer.mozilla.org/en-US/docs/Web/API/Document/createElement https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent https://medium.com/front-end-weekly/javascript-innerhtml-innertext-and-textcontent-b75ec895cbe3 https://jekyllrb.com/docs/datafiles/

Recommended Action Items

Potential solutions [draft]

option 1

Update code that does DOM manipulation. Instead of using innerHTML, we should update the code to use safer DOM manipulation methods like createElement and textContent

option 2 Alternatively, I noticed a lot of the data on the website is loaded in on page load with javascript rather than at build time. Seeing as we already use Jekyll, I suggest using its built-in functionality with the strip_html filter to load data into the HTML at build time statically, as this has benefits to both page load times and accessibility benefits to visitors to the site that have javascript disabled

github-actions[bot] commented 1 year ago

Hi @jaasonw.

Please don't forget to add the proper labels to this issue. Currently, the labels for the following are missing: Complexity, Role, Feature

NOTE: Please ignore the adding proper labels comment if you do not have 'write' access to this directory.

To add a label, take a look at Github's documentation here.

Also, don't forget to remove the "missing labels" afterwards. To remove a label, the process is similar to adding a label, but you select a currently added label to remove it.

After the proper labels are added, the merge team will review the issue and add a "Ready for Prioritization" label once it is ready for prioritization.

Additional Resources:

roslynwythe commented 1 year ago

Finally regarding your suggestion to load wins data at build time rather than on page load, I will discuss that with Bonnie to see if she is interested in pursuing that. Thanks again for your contributions; we are very happy that you are on the website team!

roslynwythe commented 1 year ago

@jaasonw Regarding your comment

Alternatively, I noticed a lot of the data on the website is loaded in on page load with javascript rather than at build time. Seeing as we already use Jekyll, I suggest using its built-in functionality with the strip_html filter to load data into the HTML at build time statically, as this has benefits to both page load times and accessibility benefits to visitors to the site that have javascript disabled

I wonder if you could indicate which pages load data on page load. I checked wins.js and project.js and both load data at build time using a liquid assign tag.

jaasonw commented 1 year ago

@roslynwythe Sorry, I should clarify that by "load data" I meant the actual creation of DOM elements based on the data. Currently, data is retrieved when the visitor loads the page, and client-side javascript is used to generate the DOM elements. This becomes apparent when Javascript is disabled when visiting the website. When Javascript is disabled, the following pages fail to render properly:

The events page does not display meeting times image

Projects on the project page do not render image

Wins on the wins page do not render image

roslynwythe commented 1 year ago

@jaasonw Thanks for the explanation.

We are open to the idea of using safer DOM manipulation methods like createElement and textContent in place of innerHTML. We should start with a relatively simple page, perhaps Events. We have been planning to update the code in the Events page, because meeting data is being retrieved JS via an API call to VRMS instead of using _data/external/vrms-data.json, which is obviously inefficient. So one option would be to write an issue for making both changes: using Jekyll to assign data from _data/external/vrms-data.json and also avoiding use of innerHTML in favor of the safer alternatives you mention. Or we could make those two separate issues.

But it sounds like alternatively, we could completely eliminate the JS that generates the DOM, instead using Jekyll to build the complete HTML at build time. And I assume that by using strip_html, that eliminates any risk of injection in the case that the JSON data source was corrupted?

Could you offer pros and cons for that alternative? Or should we create an issue to explore those pros and cons?

We really appreciate your contributions and look forward to working with you further on these issues.

ExperimentsInHonesty commented 9 months ago

Make an issue to do option 1 so that we reduce vulnerability immediately and silence the warning.

Make an epic that looks into the refactoring option and identifies all the places on the site that it would be required.

github-actions[bot] commented 9 months ago

Hi @freaky4wrld, thank you for taking up this issue! Hfla appreciates you :)

Do let fellow developers know about your:- i. Availability: (When are you available to work on the issue/answer questions other programmers might have about your issue?) ii. ETA: (When do you expect this issue to be completed?)

You're awesome!

P.S. - You may not take up another issue until this issue gets merged (or closed). Thanks again :)

freaky4wrld commented 9 months ago
ExperimentsInHonesty commented 9 months ago

@roslynwythe This ER is only partially resolved, we still need to do

Make an epic that looks into the refactoring option and identifies all the places on the site that it would be required.

roslynwythe commented 9 months ago
ExperimentsInHonesty commented 9 months ago

@freaky4wrld I am still confused. It looks like this issues are both addressing the change of using createElement and textContent instead of innerHTML

And I what I was saying was missing was the issue or epic to do option 2 after option was complete.

Alternatively, I noticed a lot of the data on the website is loaded in on page load with javascript rather than at build time. Seeing as we already use Jekyll, I suggest using its built-in functionality with the strip_html filter to load data into the HTML at build time statically, as this has benefits to both page load times and accessibility benefits to visitors to the site that have javascript disabled

Basically, option 1 is the fast, temporary fix, option 2 is the long term solution.

So can you make the issues or epics for option 2 and we can put a dependency on them of the two above issues being complete. Or is there something else I don't understand.

freaky4wrld commented 9 months ago

@ExperimentsInHonesty the issue #6303, is the issue for the fast and temporary fix that we require for the ER

Yes we would be making those issues.....

roslynwythe commented 8 months ago

@freaky4wrld - Bonnie is suggesting that in addition to refactoring the use of innerHTML where needed, we also write an epic to explore option 2:

Alternatively, I noticed a lot of the data on the website is loaded in on page load with javascript rather than at build time. Seeing as we already use Jekyll, I suggest using its built-in functionality with the strip_html filter to load data into the HTML at build time statically, as this has benefits to both page load times and accessibility benefits to visitors to the site that have javascript disabled