vichan-devel / vichan

Vichan is the most popular and widely used imageboard software in the world. It is a free, light-weight, fast, highly configurable and user-friendly imageboard software package.
https://vichan.info
Other
619 stars 194 forks source link

Blacklist image hashes? #651

Open frozenpandaman opened 9 months ago

frozenpandaman commented 9 months ago

Sorry for all the questions/issues lately. Is there a recommended way to blacklist certain image hashes? Hoping this will cut down on bot spam that seemingly posts the same things over and over.

I'm guessing there would naīvely be a way just to hardcode it into some .php file (if image hash in_array(whatever1, whatever2), return without doing anything or serve an error message) but I'm not sure where the image upload logic is happening.

crazy4cars69 commented 9 months ago

I think there was a feature in vichan using MD5 hash to prevent certain images from being posted using filters

RealAngeleno commented 9 months ago

Yes. Do something like this:

`$config['filters'][] = array( 'condition' => array( 'custom' => function($post) { if ( array_key_exists('filehash',$post) && in_array($post['filehash'],array( 'your filehash',

))) return true; else return false;}), 'action' => 'ban', 'add_note' => true, 'all_boards' => true, 'expires' => 60 60 72, // Three Days 'reason' => 'Ban evasion.' );`

Kuz hacked up a script to make it so that mods can see the md5 hashes, but he never made it public. Best way to find it is by checking the filehash column on the posts_[board] table.

crazy4cars69 commented 9 months ago

Here is so only mods can see the file hash

Edit /templates/post/fileinfo.html

Paste bellow before {% include "post/image_identification.html" %}

        {% if post.mod|hasPermission(config.mod.show_file_hash) %}
            <br />
            <span>HASH: {{ post.filehash }}</span>
        {% endif %}

Make sure to also add this to /inc/config.php

    // View file hash
    $config['mod']['show_file_hash'] = MOD;
frozenpandaman commented 8 months ago

@RealAngeleno @crazy4cars69 Thank you both so much! Really appreciated.

I'm not too great with PHP myself so being able to implement this was extremely helpful. Do let me know if you'd happen to be able to throw together some simple QuestyCaptcha code (i.e. just an input 'verification' field that checks the entered string against something in config.php) if you have time in the future.

frozenpandaman commented 8 months ago

Unfortunately this doesn't seem to be helping with spam issues. Spambots have the same image saved presumably at different compression levels or with different metadata or something, causing the blacklist for specific hashes to be ineffective.

crazy4cars69 commented 8 months ago

Unfortunately this doesn't seem to be helping with spam issues. Spambots have the same image saved presumably at different compression levels or with different metadata or something, causing the blacklist for specific hashes to be ineffective.

Yeah that will be a problem, either ban IP or IP range. Right now vichan doesn't have effective file hash spam detection apart from checking duplicate file and MD5 blacklist.

RealAngeleno commented 8 months ago

I was asked a bit about questycaptcha from a few other people too. That'll be my top priority for now, along with the wiki.

RealAngeleno commented 8 months ago

Though I wouldn't know the best way to implement it, as there's many ways to do it.

crazy4cars69 commented 8 months ago

Though I wouldn't know the best way to implement it, as there's many ways to do it.

2 versions, js which would need to add additional_javascript and would display image in base64 and reloadable, non-js which would use iframe and a reload button to refresh captcha. Keep questycaptcha config files in inc/questycaptcha/ and questycaptcha main file in root directory captcha.php. Don't forget to add it to report_captcha config

frozenpandaman commented 8 months ago

I was asked a bit about questycaptcha from a few other people too. That'll be my top priority for now

@RealAngeleno AMAZING to hear, thank you!!!

Zankaria commented 6 months ago

A possible solution would be using perceptual hashing. It's potentially more CPU intensive, but perceptual hashes extracted from an image (not a file) have a short hamming distance from hashes extracted from similar images.

Basically you open the image you want to block, you resize it to a fixed size and produce a perceptual hash of that and store it. Then, when user tries to post a new image, you open the image, resize it and hash it. With all that, if the hamming distance of the two hashes is bellow a given threshold, you classify the images as "similar enough" and reject the post. This doesn't shield much from cropping or image rotation, but it handles resizing, changes in metadata and recompression very well.

frozenpandaman commented 4 months ago

I was asked a bit about questycaptcha from a few other people too. That'll be my top priority for now, along with the wiki.

Just wanted to ask if there might be any updates on this yet, @RealAngeleno? Happy to make a new issue to track it if it's a bit separate from this original topic. Thanks!

Black-Hand-Radio commented 2 weeks ago

Unfortunately this doesn't seem to be helping with spam issues. Spambots have the same image saved presumably at different compression levels or with different metadata or something, causing the blacklist for specific hashes to be ineffective.

In my experience they just add a watermark at different locations. Since the images are always relatively high resolution, and they place the watermark at the middle/bottom, you can create partial image hashes, for example hash the decoded data of a certain region of an image and base your blacklist on that. This adds more processing, but there are several easier ways to block them before you even reach this point, for example referer detection or browser fingerprinting.

However I have no idea if they still do this watermarking thing or if they switched to a different method. Once you manage to block them, they'll turn up less and less often, so I simply don't get enough data lately.

RealAngeleno commented 5 days ago

Yeah vichan's built-in md5 hashing doesn't really work well with this. I do use a (rather jank however, but not as jank as before) method to use imagehash on python and therefore use perceptual hashes instead, which works wonders, but haven't added because it requires users to install yet another thing and because it can be slow. There is a PHP implementation I was given from an IB, though from what I've seen it's not very good at doing perceptual hashing and is also slow.