owasp-modsecurity / ModSecurity

ModSecurity is an open source, cross platform web application firewall (WAF) engine for Apache, IIS and Nginx. It has a robust event-based programming language which provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring, logging and real-time analysis.
https://www.modsecurity.org
Apache License 2.0
8.18k stars 1.6k forks source link

Timezone is not included in the time_stamp field of audit log in JSON format #2340

Open jatgh opened 4 years ago

jatgh commented 4 years ago

Description

Timezone is not included in the time_stamp field of audit log if SecAuditLogFormat is set to JSON

If SecAuditLogFormat is set to Native, the time and date are written to audit log like following:

---immbqR4e---A--
[16/Jun/2020:11:24:03 +0300]

Keeping all other setting the same, but setting SecAuditLogFormat to JSON, the time and date are written to audit log like following:

"time_stamp":"Tue Jun 16 11:24:03 2020"

Note that it's not UTC time, it's local time on the server.

Steps to reproduce the behavior

  1. Configure SecAuditLogFormat to JSON
  2. Make sure 'A' section is enabled in SecAuditLogFormat parameter. Ex.: SecAuditLogParts ABJFHZ
  3. Restart nginx to apply new settings
  4. Perform any request to the web server that would get a new record put into audit log
  5. Check the audit log, specifically time_stamp field in the latest record

Expected behavior

Rationale

Imagine modsecurity audit logs are shipped to ELK or other log management system from multiple servers, including those located in regions with with daylight saving time. Then there's no common way to correctly parse the time_stamp field given that different servers might be in different timezone and also timezone is not persistent for some of them.

Server

martinhsv commented 4 years ago

Hi @yesjustyet ,

I can reproduce your findings.

This may have been a simple oversight when the v3 version of the JSON logging was written, but it's also possible that it was an intentional decision for some reason. I will follow up.

One alternative that might potentially be useful to you is to make use of the TIME_EPOCH variable. (E.g. an extra rule with an action like 'msg:"EpochTime %{TIME_EPOCH}'.)

If it was a simple oversight, it will probably just get fixed.

On the chance that it was intentional, however, would use of TIME_EPOCH meet your needs (in general and not just for the short term)?

Thanks for raising this issue and for any feedback you may have.

jatgh commented 4 years ago

Hi @martinhsv , Thank you for the the quick and useful response! I was able to follow your advice configuring the following rule: SecAction "phase:1,log,msg:'EpochTime: %{TIME_EPOCH}'" and then parsing this message in ELK. It's not as convenient to get it from messages array as it would be from high-level time_stamp field, but it's possible (at least in ELK). I understand that changing current implementation (by adding timezone suffix to time_stamp) might cause troubles for those setups out there which does parse this field in the current format (without timezone), so your concerns are totally understandable.

jatgh commented 3 years ago

Hi! Any news on the issue? I have to say that supposed workaround is not much of a savior, if you use SecAuditEngine RelevantOnly setting. That's because creating a rule to add EpochTime field (just like creating any rule that works unconditionally) makes any request to be considered as relevant (i.e. triggered a rule). Please, correct if if I'm wrong here.

martinhsv commented 3 years ago

Hi @yesjustyet ,

No, I'm afraid there's no update.

I agree that the TIME_EPOCH workaround is not going to be convenient for all deployments.

If your ruleset is small, you could potentially add the extra output to each pre-existing message. Alternatively, if you're using something like CRS's anomaly scoring method, you could add the TIME_EPOCH output to the phase:5 rule that actually makes the decision to deny.

But, yes, in at least some setups this is likely to be a burdensome workaround.

If I think of any other suggestions in the near term I'll mention it. In the meantime, I'll add this item to our 3.1.1 list.

tomsommer commented 1 year ago

The format should be ISO8601, please - the current format is not easy to parse

martinhsv commented 1 year ago

Hi @tomsommer ,

In order to consider your request, it would probably be helpful to more fully describe what you mean.

For example, what do you mean by "the current format is not easy to parse"? Are you referring to the current timestamp information that is already present in ModSecurity v3, regardless of the presence of time zone?

In general, knowing what portions of 'ISO8601' that you believe to be relevant would be useful along with any examples of 'current' vs. your 'expected'.

ensemblebd commented 1 year ago

For me, it's hard to parse because it's not following any known valid standard format I can find. Here's a datestamp I just got that broke my pipeline: Mon May 1 00:36:47 2023 Notice the two spaces between May and 1. Up until about 30 minutes ago at midnight the following format parser was working: E MMM d H:m:s y (as per java) ie. it will start working again in 10 days, when there's two numbers to fill the string buffer slot, thus having one space not two.

Now I have to figure out how to "hack" elasticsearch to parse dates with two spaces between the parts dynamically/self-conditionally. regexp replace preprocessing i guess...

I will second the above, a standard ISO8601 would make a major difference for programmatic log analysis. For me specifically - in cpu overhead for the amount of logs I'm dealing with per second. SecAuditLogDateFormat would be a nice idea, albeit easier said than done i'm sure

Is there an official name/standard/format-definition for the date-format currently employed on audit logs?

tomsommer commented 1 year ago

Exactly what @ensemblebd said

martinhsv commented 1 year ago

I see.

That's a fair bit different from this issue (about the lack of timezone) and probably warrants being in its own, separate issue.

Would one of you like to go ahead and create one?

ensemblebd commented 1 year ago

Json format uses: https://github.com/SpiderLabs/ModSecurity/blob/2121938c5192f6224c18e5a3c9df83f2745b90dd/src/transaction.cc#L1675 https://github.com/SpiderLabs/ModSecurity/blob/2121938c5192f6224c18e5a3c9df83f2745b90dd/src/transaction.cc#L1694


Whereas the Old Audit Format uses: https://github.com/SpiderLabs/ModSecurity/blob/2121938c5192f6224c18e5a3c9df83f2745b90dd/src/transaction.cc#L1568 https://github.com/SpiderLabs/ModSecurity/blob/2121938c5192f6224c18e5a3c9df83f2745b90dd/src/transaction.cc#L1571


Humbly requesting that Json use the exact same as Old. I see no valid logical reason why this is being used whatsoever: https://en.cppreference.com/w/cpp/chrono/c/ctime But then that would break existing users who may be expecting the existing json date format. So that means we need a switch.

So perhaps:

std::string ts = null;

if (strcmp(m_rules->m_auditLog->m_dateFormatJson, 'default') == 0) {
    // default uses ctime: https://en.cppreference.com/w/cpp/chrono/c/ctime
    // The function does not support localization.
    // Converts given time since epoch to a calendar local time and then to a textual representation, as if by calling std::asctime(std::localtime(time))
    // Www Mmm dd hh:mm:ss yyyy   (unchangeable)
    ts = utils::string::ascTime(&m_timeStamp).c_str();
}
else {
    struct tm timeinfo;
    char tstr[300];

    memset(tstr, '\0', 300);
    localtime_r(&this->m_timeStamp, &timeinfo);

    strftime(tstr, 299, m_rules->m_auditLog->m_dateFormatJson, &timeinfo); // ie: SecAuditLogJsonDateFormat "%d/%b/%Y:%H:%M:%S %z"
    ts(tstr);
}

where m_dateFormatJson is formed via: /src/headers/audit_log.h

// line # 189
std::string m_dateFormatJson;
// line # 161
bool setDateFormat(const std::basic_string<char>& path);

/src/audit_log/audit_log.cc

// line # 59
m_dateFormatJson("default"),

// line # 138
bool AuditLog::setDateFormat(const std::basic_string<char>& the_format) {
    this->m_dateFormatJson = the_format;
    return true;
}

Then the seclang parser: /src/parser/seclang-parser.yy #781 (and I supose .cc needs all the line numbers transposed)

/* SecAuditLogJsonDateFormat */
    | CONFIG_DIR_AUDIT_LOG_DATE_FORMAT
      {
        std::string the_format($1);
        driver.m_auditLog->setDateFormat(the_format);
      }

And a handful of other parser changes that are beyond my knowledge level. Symbol definition for CONFIG_DIR_AUDIT_LOG_DATE_FORMAT, parserSanitizer, etc.

Then we could all just simply do this to customize to our use case:

SecAuditLogJsonDateFormat "%d/%b/%Y:%H:%M:%S %z"

In fact the variables could be renamed without the term "json" in it, and could retrofit the legacy audit log date format to use it as well. Giving us complete control over logged date formats.

martinhsv commented 1 year ago

Yes, that's right. The existing functionality uses std::ctime to produce the textual representation. Whether that function's output is "any known standard format" may be a matter of interpretation.

And, yes, a loss of compatibility for existing users would be one of my concerns. One possible solution to the tz omission alone might be to, not change the existing time_stamp item at all, but add a new one (time_stamp_tz ?). In such a case, it wouldn't be natural to consider your concern to be the same.

On the other hand, if a new configuration item is introduced in the way that you suggest, yes it probably could be appropriate to leave everything encapsulated in this single issue (while probably amending the title).

ensemblebd commented 1 year ago

Respectfully, I disagree with your position on the matter. And I'm definitely not going to argue semantic interpretations, or topic titles.

Ctime's format isn't modifiable. So if I am tangential to the original topic, if I am unnatural to the topic, then how exactly are we going to add the original poster's timezone output if not precisely by my proposed solution?

Look, there's a job to be done, we either are going to work together and do it, or not. I'll let you decide how you want to handle it. If deleting my comments is a more viable pathway for your solution (which is what exactly?), go for it.