Open VladimirMic opened 1 year ago
GitHub link: https://github.com/xsedmid/sisap23-laion-challenge-CRANBERRY/
Commit hash: [28e9941]
The params are set to search 100k subset, and they are insufficient to get 90 % recall. Instead, they are set to deal with the 7 GB of RAM in github.
For the organizers: consider this submission to be final just after the final deadline, i.e. on the July 16 AoE.
Dear team DISA-CRANBERRY (@VladimirMic),
Thank you very much for your submission. We are now in the process of evaluating your solution.
Please be reminded of the short paper deadline that is coming up on July 31st (AoE). See https://sisap-challenges.github.io/#reports for a short summary of the goals of this paper and please use the general submission guidelines at https://sisap.org/2023/guidelines.html (short research paper) to prepare your submission.
Please send the PDF of your submission via mail to the organizers:
Thanks again for your submission, and please reach out if you have any questions.
team
DISA-CRANBERRY
corresponding
xmic@fi.muni.cz
tasks
Task A
subsets and projections
100M, 30M, 10M
comments
We focus mainly on the indexing of 100M 768-dimensional vectors.
The algorithm can search an arbitrary random subset of the 100M dataset, e.g. the 30M and 10M datasets provided. The average recall should always be around 90 % and above 90 % on the 3 sets provided (100M, 30M, 10M).
Datasets with less or precisely 50M objects are loaded into the main memory during the build and searched there. However, I did not test datasets of the size around 50M.
We do not use any of the provided PCA projections even though our Python code downloads the PCA96 datasets - sorry for that.
In case of need, Jan Sedmidubsky, who made the GHA part with Python, will be available from the 12th of July. Vladimir is online anytime except for the 11th of July.
members
Vladimir Mic Jan Sedmidubsky Pavel Zezula
github link
https://github.com/xsedmid/sisap23-laion-challenge-CRANBERRY/