log2timeline / plaso

Super timeline all the things
https://plaso.readthedocs.io
Apache License 2.0
1.7k stars 334 forks source link

Have "dynamic" output format comply to RFC 4180? #3884

Open rgayon opened 2 years ago

rgayon commented 2 years ago

Description of problem:

Running psteal.py on browser history generates content that is not RFC 4180 valid, as quotes might appear in the URL+title field, without being quoted

2021-09-07T06:36:45.000000+00:00,Last Visited Time,WEBHIST,Chrome History,<SOME URL> "shouldn't have quotes here" [count: 0] Visit from: Type: [LINK - User clicked a link] (URL not typed directly),sqlite/chrome_27_history,OS:/usr/local/google/home/romaing/Google/Chrome/User Data/Default/History,-

Command line and arguments:

'Chrome' is a Windows Chrome directory

psteal.py --source Chrome -w chrome.csv

Source data:

Chrome browser history

Plaso version:

$ psteal.py --version plaso - psteal version 20210606

Operating system Plaso is running on:

linux

joachimmetz commented 2 years ago

https://datatracker.ietf.org/doc/html/rfc4180#page-2

While there are various specifications and implementations for the
   CSV format (for ex. [4], [5], [6] and [7]), there is no formal
   specification in existence, which allows for a wide variety of
   interpretations of CSV files.  This section documents the format that
   seems to be followed by most implementations:
joachimmetz commented 2 years ago

Looks like RFC 4180 also requires a CRLF for each row (not just LF).

   1.  Each record is located on a separate line, delimited by a line
       break (CRLF).
joachimmetz commented 2 years ago

Spoke with @rgayon issue here is mainly escaping double quotes, less complying to the RFC