Couple of questions about concepts.

teanalab / MRF-L

TREC'15 CDS Competition: WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources [code]

4 stars 1 forks source link

Couple of questions about concepts. #3

Open wujy2015 opened 6 years ago

wujy2015 commented 6 years ago

Hi,

Does the PRF use abstract or full paper? How many documents you use for this? Which ranking functions do you use for getting PRF documents? Did you include other concepts into first retrieval?

From my experiment, I can extract more concept than I see from your running files. I am wondering if you select only top frequency concept?

balaneshin commented 6 years ago

Hi @wujy2015 We used full paper. From 28 top-ranked documents, we chose 30 top-ranked concepts. For document retrieval, we used the previously extracted concepts and run the query by using the two-stage method proposed in here and described in here.

As I mentioned earlier, we chose only top-ranked concepts by using PRF method.

wujy2015 commented 6 years ago

Does that mean you use #rm from indri to get 28 top doc and then use UMLS to get 30 top-ranked concepts, and then merge the previous concepts ? How do you rank the concepts?

balaneshin commented 6 years ago

For PRF concepts, please see this link regarding how to compute scores for the concepts and how top 30 of them are extracted. For this concept type, we did not use UMLS to rank them.

wujy2015 commented 6 years ago

So you use RM to calculate top 30 frequency words from top 28 ranked documents? And take that as concept or input those words into UMLS?

balaneshin commented 6 years ago

@wujy2015 As can be seen from Table 2 of the paper, we have multiple concept types. In this work, concepts from top-ranked documents (described in Line 7 of Table 2) are extracted independently from UMLS concepts (described in lines 2, 3, 5, and 6 of Table 2). These UMLS concepts are only extracted from queries (topic summary and topic description). You can see the topic summaries and topic descriptions from here. Therefore, we did not input the concepts from top-ranked documents as inputs into UMLS.