wso2 / product-is

Welcome to the WSO2 Identity Server source code! For info on working with the WSO2 Identity Server repository and contributing code, click the link below.
http://wso2.github.io/
Apache License 2.0
748 stars 727 forks source link

Common Criteria Gap - FAU_STG_EXT.1 External Audit Trail Storage #16835

Closed RushanNanayakkara closed 11 months ago

RushanNanayakkara commented 1 year ago

Describe the issue: To be eligible to be granted the Common Criteria Certification as per the certification requirement FAU_STG_EXT.1 External Audit Trail Storage the following should be facilitated by the identity server.

The evaluator shall also make the connection to the external audit storage unavailable, perform audited events on the TOE, re-establish the connection, and observe that the external audit trail storage is synchronized with the local storage. Similar to the testing for FAU_GEN.1, this testing can be done in conjunction with the exercise of other functionality. Finally, since the requirement specifically calls for the audit records to be transmitted over the trusted channel established by FTP_ITC.1, verification of that requirement is sufficient to demonstrate this part of this one.

Current behaviour: The current implementation uses an in-memory queue to buffer logs, resulting in the potential loss of log batches if the queue limit is reached. Furthermore, the existing design does not adequately handle remote server failures, which is inconsistent with the above mentioned standards.

Expected behaviour: The implementation should ensure the synchronisation of the locally stored logs with the remote server logs in the event of a remote server failure and subsequent recovery


Related issues:

https://github.com/wso2/product-is/issues/16697

RushanNanayakkara commented 1 year ago

Update Summary - 3rd of Oct 2023

Completed flows / Current progress:

  1. Research for a possible solution using log4j2 for the requirement. ✅
  2. Draft implementation. ✅

Details

Findings of the research

  1. Implement an appender for log4j2
    • Appender uses a queue to buffer the logs before sending them to the server.
    • Periodically send the logs to the server from the queue.
    • In case of a server failure the queue will be kept and the retry mechanism will be triggered with the defined time period.
    • In case of a long failure in the server the memory usage of the queue and the frequency of the logs being generated will be considered to consider storing the logs in a temporary file which will then be sent to the server when available.
    • The upload speed of the logs to the server has to be faster than the frequency of the logs being generated. ( This is to ensure that the queued logs in the failure period will be dequeued eventually. ) Appender Choice :
    • An asynchronous appender needs to be used to be not blocking. (eg: AsyncAppender )
    • The AsyncAppenders by default uses ArrayBlockingQueue for buffering. However this is prone to deadlocks when used in multi threaded environments. Performance of AsyncAppenders
    • https://logging.apache.org/log4j/2.x/performance.html#asyncLogging
    • Therefore in that case the use of lock-free Async Loggers
    • https://logging.apache.org/log4j/2.x/manual/async.html
    • In case the logging rate cannot be handled by the queue log4j2.AsyncQueueFullPolicy can be used to configure to handle the filtering process for the logs to be preserved.
  2. Improve the current implementation of SecuredHttpAppender with a file based queue.
    • Use a file based queue in place of the current on memory queue for the log events.
    • Change implementation to not drop any logs.
    • Improve the log publishing implementation with a fallback mechanism to deal with the remote server failures.
    • Implement a cleaning mechanism to remove the expired local temporary log files.

Draft implementation description.

TODO

[1] https://github.com/OpenHFT/Chronicle-Queue [2] https://github.com/wso2/carbon-commons/pull/477

RushanNanayakkara commented 1 year ago

Update Summary - 16th of Oct 2023

Completed flows / Current progress:

  1. Finalise the discussion on the solution architecture. ✅
  2. Take permission from the security team to introduce chronicle-queue dependency to the product. ✅
  3. Analyse the disk usage and decide on how to impose limitations while adhering to certificate guidelines. ✅
  4. Complete the implementation. 🚧

Details

Finalise the discussion on the solution architecture.

Permission taken from the security team to introduce the dependency to the product.

Complete the implementation.

TODO

RushanNanayakkara commented 1 year ago

Update Summary - 31st of Oct 2023

Update

Completed flows / Current progress:

  1. Implementation of the custom file queue 🚧

Todo

  1. Test the custom file queue performance.
RushanNanayakkara commented 11 months ago

Update Summary - 31st of Oct 2023

Update

Implemented design

High level architecture

Persistent Queue High Level Architecture

Disk and Memory Management

Memory Usage

Disk Usage

PersistentQueueDataManagement

QueueBlock File Structure

Persistent Queue Bit Sequence Structure

Thread Safety

Read operations are allowed for multiple threads. Write operations are synchronized.

Performance

Performance test done using audit logs generated by Asgardeo. Enqueue Rate: 13500+ operations per second Dequeue Rate: 8000+ operations per second

Testing

The implementation is tested for the following scenarios

Start up

General Case

Configurations

Event logs

RushanNanayakkara commented 11 months ago

Closing as completed.