Open nlw0 opened 1 year ago
No, but it has many @inbounds, i.e., if you access a database out of bounds, it will fail. Try running Julia with --check-bounds=no to find the error without any segmentation fault.
On the other hand, we use Polyester.jl
and StrideArraysCore.jl
, which use several low-level tricks to achieve high performance. In particular, its @batch
macro also changes arrays for PtrArray
s; in particular, the latter uses several tricks that can produce segmentation faults.
If you think that failure is related to StrideArrays or Polyester, then you should use StrideArraysCore.boundscheck() = true
(inside the running Julia process) as documented in https://github.com/JuliaSIMD/StrideArraysCore.jl
Based on your trace, I guess you are accessing object identifiers beyond the bounds of the dataset... but it is hard to know without a minimal working example
Thanks, from what I have now seen it looks like accessing out of bounds indeed, but I'm merely running
searchbatch(ExhaustiveSearch(BinaryHammingDistance(), MatrixDatabase(db)), MatrixDatabase(da), 2)
And later I use the returned indices. It seems the out of bounds error would be at that moment, what would indicate invalid indices in the answer. On the other hand, it should only cause a regular out-of-bounds error from accessing a regular Julia array, there should be no relation to Polyester or the MatrixDatabase object anymore. I tried creating a MWE but it runs ok, only in my application it happens now and then.
Since the failure is outside of the Polyester domain, you can see the index that fails to turn off the @inbounds
macro (run Julia with --check-bounds=yes).
You can also check the output ranges with the extrema
function on the knn indices returned by searchbatch
.
If you see an unexpected 0, you may have reached a documented, but perhaps bad, implementation decision. Queries fetching less than $k$ neighbors pad the end of the column with zeros. In other words, the search batch pads the column with zeros instead of failing or filling it with random results.
https://sadit.github.io/SimilaritySearch.jl/dev/api/#SimilaritySearch.searchbatch
This should not happen in most workloads if you use ExhaustiveSearch
, only if you ask for $k \geq n$, where $n$ is the length of the dataset. However, using SearchGraph
has more cases of failing to retrieve k
elements. For instance, this can happen when the $k$ is large and the query is pretty easy to solve. Another possible cause (again with SearcGraph) is whenever discrete distance functions don't allow routing and the algorithm behind don't find suitable candidates to explore the underlying graph.
some possible workarounds you can apply, depending on your application:
search
instead of searchbatch
, search
solves only individual queries and uses a KnnResult
structure that allows more control.k
elements you ask, then you need to manipulate the hyperparameters of SearchGraph
; for example, add different hints strategies through callbacks
kwarg in the insertion and indexing functions. This will let you specify a small set of elements that should be revised always, and that should be similar to the number of k
elements you expect to fetch.You can see a working example with binary hamming in https://github.com/sisap-challenges/sisap23-laion-challenge-julia-example/blob/c89913792c6fe8aaa27cc3766a42ba34dccaf387/src/run.jl#L100
I'm investigating this again. Unfortunately it's not easy to reproduce, I imagine might be related to threads. I upgraded to the latest version but it still happens. Here's the latest stacktrace I could capture:
[79918] signal (11.1): Segmentation fault
in expression starting at none:1
size at ./array.jl:150 [inlined]
axes at ./abstractarray.jl:98 [inlined]
to_indices at ./indices.jl:344 [inlined]
view at ./subarray.jl:176 [inlined]
getindex at /home/user/.julia/packages/SimilaritySearch/h5gkj/src/db/matrixdatabase.jl:22 [inlined]
solve_single_query at /home/user/.julia/packages/SimilaritySearch/h5gkj/src/SimilaritySearch.jl:185
macro expansion at /home/user/.julia/packages/SimilaritySearch/h5gkj/src/SimilaritySearch.jl:176 [inlined]
#124 at /home/user/.julia/packages/Polyester/qUpZt/src/closure.jl:279 [inlined]
BatchClosure at /home/user/.julia/packages/Polyester/qUpZt/src/batch.jl:12
unknown function (ip: 0x7f4f6534d2df)
_call at /home/user/.julia/packages/ThreadingUtilities/3z3g0/src/threadtasks.jl:11 [inlined]
ThreadTask at /home/user/.julia/packages/ThreadingUtilities/3z3g0/src/threadtasks.jl:29
unknown function (ip: 0x7f4f6536d5df)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
start_task at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/task.c:1092
Allocations: 360068756 (Pool: 357671085; Big: 2397671); GC: 1486
/home/user/.local/bin/julia: line 2: 79918 Segmentation fault (core dumped) /home/user/src/julia-1.9.0/bin/julia --threads=auto "$@"
My impression is that some value in the return matrix does not get initialized, so I get some junk values, I've even seen negative values. There are two situations, either searchbatch
works, but returns an invalid output where there's an index that is some weird number (probably read from non-initialized memory), or sometimes this invalid index also shows up inside SimilaritySearch. So sometimes I get an error in my code, and sometimes in SimilaritySearch, more specifically at https://github.com/sadit/SimilaritySearch.jl/blob/aebe1ff6751df1c089ddfd8acab74c6ba86691e7/src/db/matrixdatabase.jl#L22 , called from https://github.com/sadit/SimilaritySearch.jl/blob/aebe1ff6751df1c089ddfd8acab74c6ba86691e7/src/SimilaritySearch.jl#L185 and https://github.com/sadit/SimilaritySearch.jl/blob/aebe1ff6751df1c089ddfd8acab74c6ba86691e7/src/SimilaritySearch.jl#L176
All I can imagine is that somehow @batch minbatch=minbatch per=thread for i in eachindex(Q)
could be producing invalid values for i
somehow. But I'm not familiar with Polyester at all to determine if it might be possible.
Can you try starting julia with --check-bounds=yes
?
I met a similar problem. I am trying to use @threads
for doing multiple Approximate Nearest Neighbor search tasks on distinct databases. Within each database, @batch
was used for speeding up the work. I observed Segmentation Fault in this use case. Using @batch
instead of @threads
Segmentation Fault now goes away. I am curious to why doing the later resolves the issue.
Got a segfault today, seems related to this package. Does anyone know how this could be even possible? Are there any external binary dependencies in this package?