joshhighet / ransomwatch

the transparent ransomware claim tracker 🥷🏼🧅🖥️
https://ransomwatch.telemetry.ltd
The Unlicense
924 stars 141 forks source link

Remove parser for part1/2/3... and add only new posts from Clop #51

Closed maiqueg closed 1 year ago

maiqueg commented 1 year ago

Clop parser is monitoring all the new posts about part1/2/3... however it's not really monitoring when a new ransomware attack occurred. I think it makes more sense to get the list of companies from the top of the page and not the posts about parts being published.

Per example, last entry with the current parser was added 2023-03-16, but according to my other monitors, this attack was already listed two days ago, on 2023-03-14.

This issue is open to discussion, but having the new attacks monitored instead the files being published tend to make more sense (on all the other groups it's not monitoring the new files added, only the new victims added).

Regex for the parser: grep 'g-menu-item-title' source/clop-*.html --no-filename | sed -e s/'<span class="g-menu-item-title">'// -e s/"<\/span>"// -e 's/^ *//g' -e 's/[[:space:]]*$//' -e 's/^ARCHIVE[[:digit:]]$//' -e s/'^HOW TO DOWNLOAD?$'// -e 's/^ARCHIVE$//' -e 's/^HOME$//' -e '/^$/d'

There was already a similar issue #18 and this parser above would solve the /stats too.

joshhighet commented 1 year ago

Thanks heaps!

joshhighet commented 1 year ago

Ii've removed the previous part records for clop from posts.json this has removed 1136 otherwise duplicate entries which should be reflected at next run thanks again