Open rtpt-alexanderneumann opened 2 months ago
At least some of the code in question was added in https://github.com/salesagility/SuiteCRM/pull/6883 by @g-fantini. Can you maybe chime in? :)
The following patch disables cleaning HTML from text fields (which is the right thing to do in my professional opinion), it does not solve removing script
tags yet:
diff --git a/data/SugarBean.php b/data/SugarBean.php
index 4f9ca9f25..d6b346f86 100755
--- a/data/SugarBean.php
+++ b/data/SugarBean.php
@@ -2522,11 +2522,6 @@ class SugarBean
if (isset($def['type']) && ($def['type'] == 'html' || $def['type'] == 'longhtml')) {
$this->$key = purify_html($this->$key);
- } elseif (
- (strpos((string) $type, 'char') !== false || strpos((string) $type, 'text') !== false || $type == 'enum') &&
- !empty($this->$key)
- ) {
- $this->$key = purify_html($this->$key);
}
}
}
Shall I submit a PR?
If you search for some of my many rants regarding the over-zealous clean-ups in SuiteCRM you can get an idea of the depths of this problem....
https://github.com/salesagility/SuiteCRM/pull/9127
Just for laughs, I found this in my notes from a few years ago, when trying to work around one of these problems. This is a description of a code flow within SuiteCRM:
include/entryPoint.php:98 clean_incoming_data changes HTML tags in $_REQUEST, saves originals in RAW_REQUEST
Later in Save, handleAttachmentsProcessImages is called in modules/EmailTemplates/EmailTemplateFormBase.php:171
which calls processImages, which uses from_html (from db_utils?) to reverse HMTL tags to examine img links etc, but doesn't save it unless there are images... (incongruent).
At the end of that, calls save from EmailTemplate, which calls parent::save from SugarBean
Which calls cleanBean, overriden in modules/EmailTemplates/EmailTemplate.php:855
This encodes vars, calls parent::cleanBean, puts back variables de-encoded...
cleanBean in SugarBean iterates all fields, has "if" for some types, applies this jewel:
$this->$key = htmlentities(SugarCleaner::cleanHtml($this->$key, true));
this calls HtmlSanitizer.php, cleanHtml
which starts by running html_entity_decode but does not use ENT_QUOTES, so misses ' (')
if ($removeHtml === true) $clean_html = [$sugarCleaner]$purifier->purify($dirty_html_decoded);
which throws away data
Save, goes into DBManager, uses from_html on each field, which uses html_entity_decode(with ENTQUOTES) and str_ireplace
I know that sounds depressing, there are just too many layers in there, and some major errors were made years ago... now people are afraid to revert them. But in practice we can often solve the issues just by overriding a save function and putting back RAW values.
What we should be doing is simply:
I feared that it may be a lot deeper, but was too hesitant to look into it (not to stare into the abyss too much). I agree on most points, except this one:
escape stuff that is dangerous to the DB (SQL delimiters etc) right before going into the DB
Use Prepared Statements instead :)
Oh wow, I just found out that I (with a different account) reported the exakt same behavior back in 2017 https://github.com/salesagility/SuiteCRM/issues/4446
I agree with the Prepared Statements, but that would be a very big change to apply everywhere in SuiteCRM code. The new v8 doesn't have any of these issues, I guess the way forward is just to progressively deprecate all of the old stuff. The new stuff naturally leads to much better practices (e.g. Symfony Doctrine)
The new v8 doesn't have any of these issues
Hm, I'm not sure I understand you correctly. The v8 demo instance also just removes everything between <
and >
in descriptions of calls (<script>
tags, email addresses).
You're right, I guess I overstated things (a lot). The technologies in v8 are better and potentially provide a cleaner architecture and better code. But there are still many bits of v7 in the code flow and so these bugs still arise.
If the app would call mysqli_real_escape_string
on the the user provided text while saving into the database, and before outputting to the page, first process the text with htmlspecialchars()
, then the exact user-generated text would display on the page correctly, and this issue bug would be solved.
Issue
We frequently discuss HTML and/or JavaScript issues with clients. We copy emails as text into the body of a call. I've recently upgraded to 7.14.3 (from a much older version) and I've discovered that SuiteCRM will mangle the text saved into the description of a Call (likely elsewhere as well):
<
and>
is removed, e.g. the text<script>fetch('/x')</script>
is silently removedmailto:
link is inserted in an odd way: the original text<foo@example.com>
is replaced with<<a href="mailto:foo@example.com">foo@example.com</a>>
, which is rendered in verbatim so the mailto link cannot be clicked (screenshots below)Expected Behavior
I expect that text that I enter into the description field of a call (and everywhere else, too) is saved into the database as I entered it and it is displayed again (as text) with removing any parts.
Actual Behavior
mailto:
link (screenshots below)Possible Fix
htmlentities()
)Steps to Reproduce
Log into SuiteCRM demo instance (version 7.14.3 at the time of writing)
Edit a Call, enter the following text as the description:
Is this the right way?
From: Subject: How to call fetch?
Is this the right way?
From: foo foo@example.com Subject: How to call fetch?
Is this the right way?
From: foo <foo@example.com>
From: foo <foo@example.com>