vinmittal / SecurityTrainingPub

11 stars 5 forks source link

Studying Blue Coat Logs from Syria #3

Open vinmittal opened 8 years ago

vinmittal commented 8 years ago

The following websites have logs from syria. http://bluesmote.com/

Please download a few dumps. There are github projects that know how to parse these logs. the logs fields have following fields https://github.com/hellais/Syria-Blue-Coat-log-analysis/blob/master/parser.py

Our objective here is two fold :

  1. We want to understand the type of fields that can come in the proxy logs
  2. we want to develop analytics and visualization for these logs.
  3. We want to do a machine learning algorithm to segment the user in different categories depending upon their behavior?

I would persuade you to download a few GB and study them. Python and Pig are suitable tools for this from big data point of view. Please ask questions on this forum.

sushantMoon commented 8 years ago

I am having problem with downloading those proxy logs cause of lack of seeders in torrents. I tired looking elsewhere for them but failed to get any. Sir can you suggest any other alternative for the logs or for the link.

vinmittal commented 8 years ago

sure, here it is http://project-bluesmote.s3-website-us-east-1.amazonaws.com/v1/blocks/ i have downloaded one of these bundes on the amaazon vm, that i mentioned in my email yesterday, login using putty at ubuntu@52.10.203.39 with the key that was in my email yesterday. the file is in /mnt or you can download from the above link directly.

sushantMoon commented 8 years ago

I have re-written the parser.py script as the script given in the https://github.com/hellais/Syria-Blue-Coat-log-analysis/blob/master/parser.py had some errors.

The following are the fields that are present in each line (tab separated) of the proxy logs mentioned in the link above : Log Address, Log Number, date, time, time-taken, c-ip, cs-username, cs-auth-group, x-exception-id, sc-filter-result, cs-categories, cs-referer, sc-status, s-action, cs-method, rs-content-type, cs-uri-scheme, cs-host, cs-uri-port, cs-uri-path, cs-uri-query, cs-uri-extension, cs-user-agent, s-ip, sc-bytes, cs-bytes, x-virus-id

For better understanding of the above fields will it be good to follow the site below ?? https://bto.bluecoat.com/webguides/cacheflow/3x/webguide/Content/CPL/Access-Log-Fields.htm

For the second task, should the analysis and visualization for be like how it is done by hellais, http://hellais.github.com/syria-censorship

sushantMoon commented 8 years ago

Where should the script for the parser be uploaded for others to use?? (:heavy_check_mark: )

Also the third task, which all attributes/ field should be used for behavior mapping of the user. If done by clustering the users depending on the site they visit won't it be similar to task second.

vinmittal commented 8 years ago

please put them here https://github.com/vinmittal/SecurityTrainingPub/tree/master/socinvestigations/BlueCoatLogs Inside that there are src, test and log directories to contain project artificats as requried. please grow as required.