A quick reverse image lookup using the pHash (DCT) algorithm, designed specifically for CMX cover.
Powered by Meilisearch.
cargo build --release --all
aufnehmen
hashing will be slow without optimizationwasm-pack build ermitteln-wasm --release --target web
maturin build --release --strip --sdist --manifest-path ./ermitteln-python/Cargo.toml
(optional)npm install
npm run build
or npm run generate
node ./frontend/.output/server/index.mjs
or host the dist
folder if using generate
You would need to download CMX cover into a folder; we are expecting a specific filename like this: CmxID.jpg
aufnehmen
binary from target/release
, then copy it to your CMX cover parent directory.MEILI_URL=your_meilisearch_instance_url
and MEILI_KEY=your_api_key_or_master_key
to your environment../aufnehmen ingest <directory>
ermitteln-images
.Currently, the cover blacklist for placeholders is hard-coded. If you discover a new one, you can open a new issue and include the hash of the file.
To obtain the hash of the file, use the following command:
$ ./aufnehmen hash <input.jpg>
This command will provide you with both the pHash and SHA-256 hash of the image.
Duplicates usually occur when publishers use the same cover for multiple chapters of their comics, and the hashing algorithm cannot distinguish much difference.
The program will not delete duplicates for you; you will need to manually remove them.
We recommend the following settings for ermitteln-images
index settings:
{
// Display all attributes (only 'id' and 'hash' are available)
"displayedAttributes": ["*"],
// Search on all attributes
"searchableAttributes": ["*"],
// Allow filtering on ID for easy bisect search
"filterableAttributes": ["id"],
// Enable sorting on ID for use in the frontend
"sortableAttributes": ["id"],
// The following are typo-tolerance settings; only the necessary values are provided
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
// Since the hash is 7 characters long (base64 encoded),
// we use the following typo tolerance
"oneTypo": 7, // Enable single chara typo tolerance
"twoTypos": 8 // Disable typo tolerance on two typos
},
// Disable typo tolerance on ID for exact matching.
"disableOnAttributes": ["id"]
}
}
Everything else can be set to defaults.
You can follow Meilisearch Tutorial on how to configure your index.
MIT License