linkedin / detext

DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
BSD 2-Clause "Simplified" License
1.26k stars 133 forks source link

Can it rank the documents semantically without pretraining #36

Closed deepankar27 closed 3 years ago

deepankar27 commented 4 years ago

Hello Team,

Say I don't have any query to documents mapped, I have only documents will it rank the documents semantically based on add hoc questions? I have went through your documentation but didn't find such feature listed.

Thanks in advance.

StarWang commented 4 years ago

Hi @deepankar27 , given that your use case is "rank the documents given ad hoc questions", you will still need a dataset that has query to documents mapping. The key is that your training and inference scenario need to be consistent

luke4u commented 4 years ago

@StarWang , thanks for the explanation. just to confirm my understanding. We still need to retrain the model with a dataset containing query to document mapping, before inference?

StarWang commented 4 years ago

@luke4u Yes. It's actually train instead of retrain because we provide a training framework instead of a model.

aradhanacha commented 3 years ago

@StarWang do you have any sample code for ranking the data..Documentation doesnt have much explanation

StarWang commented 3 years ago

Hi @aradhanacha , you can find the scoring code for a trained model in a hands-on tutorial here

aradhanacha commented 3 years ago

Thanks @StarWang I will try this code. Can you also suggest me how many examples should suffice for query,document ranking task..As in real world implementations its difficult to get actual ranking of document for query.do you have any sample training dataset for this task?

StarWang commented 3 years ago

@aradhanacha It really depends on the complexity of your task, your data cleanness, etc.. The only recommendation is that you have to try.

For datasets, you can try locating available datasets from ranking related papers. Robust04 and ClueWeb09-B are classic datasets for document retrieval, which you can use for ranking as well

aradhanacha commented 3 years ago

Thanks @StarWang