Closed JackTan25 closed 8 months ago
the log is here.
2024-02-27 10:22:39.430 UTC [45] LOG: server process (PID 101) was terminated by signal 11: Segmentation fault 2024-02-27 10:22:39.430 UTC [45] DETAIL: Failed process was running: select * from t1 order by a <-> ARRAY[0,0,0,1,8,7,3,2,5,0,0,3,5,7,11,31,13,0,0,0,0,29,106,107,13,0,0,0,1,61,70,42,0,0,0,0,1,23,28,16,63,4,0,0,0,6,83,81,117,86,25,15,17,50,84,117,31,23,18,35,97,117,49,24,68,27,0,0,0,4,29,71,81,47,13,10,32,87,117,117,45,76,40,22,60,70,41,9,7,21,29,39,53,21,4,1,55,72,3,0,0,0,0,9,65,117,73,37,28,23,17,34,11,11,27,61,64,25,4,0,42,13,1,1,1,14,10,6] limit 5; 2024-02-27 10:22:39.430 UTC [45] LOG: terminating any other active server processes 2024-02-27 10:22:39.430 UTC [72] WARNING: terminating connection because of crash of another server process 2024-02-27 10:22:39.430 UTC [72] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2024-02-27 10:22:39.430 UTC [72] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2024-02-27 10:22:39.432 UTC [103] FATAL: the database system is in recovery mode 2024-02-27 10:22:39.434 UTC [45] LOG: all server processes terminated; reinitializing 2024-02-27 10:22:39.478 UTC [104] LOG: database system was interrupted; last known up at 2024-02-27 10:12:27 UTC 2024-02-27 10:22:39.551 UTC [104] LOG: database system was not properly shut down; automatic recovery in progress 2024-02-27 10:22:39.553 UTC [104] LOG: redo starts at 0/40E00168 2024-02-27 10:22:39.553 UTC [104] LOG: invalid record length at 0/40E001A0: wanted 24, got 0 2024-02-27 10:22:39.553 UTC [104] LOG: redo done at 0/40E00168 2024-02-27 10:22:39.561 UTC [45] LOG: database system is ready to accept connections
Thanks for your report. @JackTan25
Yes. As described in README, SPANN will be integrated soon because it depends on SPFresh to offer index insert and update. The work is in this PR https://github.com/microsoft/SPTAG/pull/406. Mainly because the current code of SPANN makes it challenging to insert data.
Now it can also be done, but there are more intricate steps in our evaluation process.
[MetaData] MetaDataFilePath= MetaDataIndexPath= [Base] ValueType=Float DistCalcMethod=L2 IndexAlgoType=BKT Dim=128 VectorPath=/tmp/sift/sift_base.fvecs VectorType=XVEC VectorSize=1000000 VectorDelimiter= QueryPath=/tmp/sift/sift_query.fvecs QueryType=XVEC QuerySize=100 QueryDelimiter= WarmupPath= WarmupType=DEFAULT WarmupSize=10000 WarmupDelimiter= TruthPath=/groundtruth TruthType=DEFAULT GenerateTruth=false HeadVectorIDs=head_vectors_ID_Int8_L2_base_DEFUALT.bin HeadVectors=head_vectors_Int8_L2_base_DEFUALT.bin IndexDirectory=/tmp/spann_index HeadIndexFolder=head_index [SelectHead] isExecute=false TreeNumber=1 BKTKmeansK=32 BKTLeafSize=8 SamplesNumber=10000 SaveBKT=false SelectThreshold=10 SplitFactor=6 SplitThreshold=25 Ratio=0.18 NumberOfThreads=160 BKTLambdaFactor=1.0 [BuildHead] isExecute=false NeighborhoodSize=32 TPTNumber=64 TPTLeafSize=2000 MaxCheck=16324 MaxCheckForRefineGraph=16324 RefineIterations=3 NumberOfThreads=160 BKTLambdaFactor=-1.0 [BuildSSDIndex] isExecute=false BuildSsdIndex=false NumberOfThreads=160 InternalResultNum=256 ReplicaCount=8 PostingPageLimit=120 OutputEmptyReplicaID=1 [SearchSSDIndex] isExecute=true BuildSsdIndex=false InternalResultNum=256 SearchInternalResultNum=256 NumberOfThreads=16 SearchResult=/data/result.bin QpsLimit=0 ResultNum=50 TruthResultNum=50 MaxCheck=8192 SearchPostingPageLimit=120 MaxDistRatio=10000 Rerank=100 EnableADC=false RecallAnalysis=true DebugBuildInternalResultNum=256
Thanks for your replying,But I'm still confused
step 4
, I don't know where is the configure file, and do you mean I just copy the content you give above into the configure file directly, the VBase Paper's test of SPANN is like this way?@zqxjjj
well, I follow this document https://github.com/microsoft/SPTAG/blob/main/docs/GettingStart.md,
1.I can't find the [MetaData] MetaDataFilePath= MetaDataIndexPath= [Base] ValueType=Float DistCalcMethod=L2 .....
you give above, but I can find the configure file here.:
What's the difference of meta.bin
and metaindex.bin
, I can see the metaindex.bin
means the offset, but what's vector 1 meta
,vector 2 meta
, I can't find the explanation in the ReadMe.md.
same here, what is the semantic of metadata:
can I replace all of these bin
file with my own txt format files, right:
by the way, is there a WeChat user group or other ways to communicate? @zqxjjj
Thanks for your feedback. @JackTan25
I am not an expert on SPANN. But I can share all that I know. 1, Some items for build and search are different in the config file. And it is related to the dataset and not related to VBase. 2&3, There is an address pointing to the row in the table for each item in the index. That is how meta data is used in VBase. Of course, it can be used for other motivations. Each vector has a meta data item in SPANN. 4, It depends on the format in the txt file. SPANN supports several data format. https://github.com/microsoft/SPTAG/tree/main/AnnService/src/Helper/VectorSetReaders 5, Which way do you think will offer more efficient communication? I am very open to exploring better communication paradigms. GitHub provides an excellent platform for communication.
So the meta is generated by SPann not related to Vbase, And I also donn't need to build it mually. right? @zqxjjj
if you can give me the way to reproduce the result in VBase Paper, Maybe give me the detailed steps one by one, I think that's better, In the Sptag repo, the readme is too complex, there are too much parameters, as you said above, the parameter in the configure file are different with Vbase, I have fallen into a trouble in the reproduce. @zqxjjj
Yes. Creating an index in SPANN is a little complex. Let me figure out how to offer some tools to make it automated.
-> SPTAG/Release/ssdserving buildIndex.ini
Example content in buildIndex.ini
[Base] ValueType=Float DistCalcMethod=L2 IndexAlgoType=BKT Dim=1025 VectorPath=/raw_data/collections/rec_embeds_collection_spann.bin VectorType=DEFAULT VectorSize=330922 VectorDelimiter= QueryPath=/artifacts/scripts/data_prepare/new_image_embedding_query.bin QueryType=DEFAULT QuerySize=100 QueryDelimiter= WarmupPath= WarmupType=DEFAULT WarmupSize=10000 WarmupDelimiter= TruthPath=/groundtruth TruthType=DEFAULT GenerateTruth=false HeadVectorIDs=head_vectors_ID_UInt8_L2_base_DEFUALT.bin HeadVectors=head_vectors_UInt8_L2_base_DEFUALT.bin IndexDirectory=/raw_data/data HeadIndexFolder=head_index [SelectHead] isExecute=true TreeNumber=1 BKTKmeansK=32 BKTLeafSize=8 SamplesNumber=10000 SaveBKT=false SelectThreshold=10 SplitFactor=6 SplitThreshold=25 Ratio=0.18 NumberOfThreads=160 BKTLambdaFactor=1.0 [BuildHead] isExecute=true NeighborhoodSize=32 TPTNumber=64 TPTLeafSize=2000 MaxCheck=16324 MaxCheckForRefineGraph=16324 RefineIterations=3 NumberOfThreads=160 BKTLambdaFactor=-1.0 [BuildSSDIndex] isExecute=true BuildSsdIndex=true NumberOfThreads=160 InternalResultNum=256 ReplicaCount=8 PostingPageLimit=120 OutputEmptyReplicaID=1 [SearchSSDIndex] isExecute=false BuildSsdIndex=true InternalResultNum=256 SearchInternalResultNum=256 NumberOfThreads=16 SearchResult=/data/result.bin QpsLimit=0 ResultNum=50 TruthResultNum=50 MaxCheck=8192 SearchPostingPageLimit=120 MaxDistRatio=10000 Rerank=100 EnableADC=false RecallAnalysis=true DebugBuildInternalResultNum=256
-> create index image_spann_index on recipe_table using spann(image_embedding spann_vector_l2_ops); The meta data will be in the index directory.
-> cp -r /raw_data/data/ /indexdata/image_spann_index/ -> cp /u02/pgdata/13/base/16386/meta /indexdata/image_spann_index/
[MetaData] MetaDataFilePath=meta.bin MetaDataIndexPath=metaIndex.bin MetaDataToVectorIndex=false [Index] IndexAlgoType=SPANN ValueType=Float [Base] ValueType=Float DistCalcMethod=L2 IndexAlgoType=BKT Dim=1025 VectorPath= VectorType=DEFAULT VectorSize=330922 VectorDelimiter= QueryPath=/data/img_embeds_query.bin QueryType=DEFAULT QuerySize=10000 QueryDelimiter= WarmupPath= WarmupType= WarmupSize=-1 WarmupDelimiter= TruthPath= TruthType= GenerateTruth=false HeadVectorIDs=head_vectors_ID_UInt8_L2_base_DEFUALT.bin HeadVectors=head_vectors_UInt8_L2_base_DEFUALT.bin IndexDirectory=/indexdata/image_spann_index HeadIndexFolder=head_index [SelectHead] isExecute=false TreeNumber=1 BKTKmeansK=32 BKTLeafSize=8 SamplesNumber=10000 SaveBKT=false SelectThreshold=10 SplitFactor=6 SplitThreshold=25 Ratio=0.18 NumberOfThreads=160 BKTLambdaFactor=1.0 [BuildHead] isExecute=false TreeFilePath=tree.bin GraphFilePath=graph.bin VectorFilePath=vectors.bin DeleteVectorFilePath=deletes.bin EnableBfs=0 BKTNumber=1 BKTKmeansK=32 BKTLeafSize=8 Samples=1000 BKTLambdaFactor=-1.0 TPTNumber=64 TPTLeafSize=2000 NumTopDimensionTpTreeSplit=5 NeighborhoodSize=32 GraphNeighborhoodScale=2.000000 GraphCEFScale=2.000000 RefineIterations=3 EnableRebuild=0 CEF=1000 AddCEF=500 MaxCheckForRefineGraph=16324 RNGFactor=1.000000 GPUGraphType=2 GPURefineSteps=0 GPURefineDepth=30 GPULeafSize=500 HeadNumGPUs=1 TPTBalanceFactor=2 NumberOfThreads=160 DistCalcMethod=InnerProduct DeletePercentageForRefine=0.400000 AddCountForRebuild=1000 MaxCheck=16324 ThresholdOfNumberOfContinuousNoBetterPropagation=3 NumberOfInitialDynamicPivots=50 NumberOfOtherDynamicPivots=4 HashTableExponent=2 DataBlockSize=1048576 DataCapacity=2147483647 MetaRecordSize=10 [BuildSSDIndex] isExecute=false BuildSsdIndex=true NumberOfThreads=160 InternalResultNum=32 ReplicaCount=8 PostingPageLimit=150 OutputEmptyReplicaID=1 TmpDir=. SearchInternalResultNum=32 SearchResult=/data/result.bin QpsLimit=0 ResultNum=50 TruthResultNum=100 MaxCheck=8192 SearchPostingPageLimit=150 MaxDistRatio=10000 Rerank=100 EnableADC=false RecallAnalysis=true DebugBuildInternalResultNum=32
we can get it successfully by following above. But I make a mistake here, I forget to do chmod
for /indexdata/xxxx/meta*.bin
because I use postgres user to start this. Otherwise we will get Failed to create file handle:/indexdata/image_spann_index/meta.bin
at AsyncFileReader.h
.
postgres=# select * from t3 order by a <-> '{0.3,0.4,0.5}' limit 1;
INFO: try begin scan,path: /image_spann_index/
INFO: try begin scan successfully.
INFO: finished spann search
a
------------------------------------
{0.95990366,0.95319396,0.99043304}
(1 row)
Time: 22.892 ms
So if you are trying to start it as another user and make new database folder by yourself. Please see my error. For now, the spann index can work successfully. Let's close this issue.
select * from t1 order by a <-> ARRAY[0,0,0,1,8,7,3,2,5,0,0,3,5,7,11,31,13,0,0,0,0,29,106,107,13,0,0,0,1,61,70,42,0,0,0,0,1,23,28,16,63,4,0,0,0,6,83,81,117,86,25,15,17,50,84,117,31,23,18,35,97,117,49,24,68,27,0,0,0,4,29,71,81,47,13,10,32,87,117,117,45,76,40,22,60,70,41,9,7,21,29,39,53,21,4,1,55,72,3,0,0,0,0,9,65,117,73,37,28,23,17,34,11,11,27,61,64,25,4,0,42,13,1,1,1,14,10,6] limit 5; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.
What’s wrong with that? @zqxjjj