Open screwface106 opened 7 months ago
I am not sure I understand. Can you provide more details?
I believe the distance is calculated as the Euclidean distance. The Euclidean distance is determined as the square root of the sum of the squares of the differences between the query vector and the training vector. The question becomes, which difference contributes the most to the increase in distance? Can an algorithm be added to determine this?
This logic ranks the squared differences between the query vector parameters and the learned vector parameters used in Euclidean distance calculations. If this logic is customized and added to the HNSW algorithm, it could enable the calculation of the contribution of each feature to the detection of outliers.
// Function to calculate the sum of squared differences between vectors
std::vector
// Function to rank contributions from the sum of squared differences
std::vector
// Sort in descending order
std::sort(indices.begin(), indices.end(), [&](size_t a, size_t b) {
return squaredDifferences[a] > squaredDifferences[b];
});
return indices;
}
int main() {
// Example vectors
std::vector
// Calculate squared differences
auto squaredDifferences = computeSquaredDifferences(learnedVector, queryVector);
// Get ranking of contributions
auto rankings = rankContributions(squaredDifferences);
// Output rankings
for (auto idx : rankings) {
std::cout << "Feature " << idx << " Contribution: " << squaredDifferences[idx] << std::endl;
}
return 0;
}
Is it possible to add a program that can identify which elements of the vector being inspected contribute significantly to the increase in distance?