joaopalotti / trectools

A simple toolkit to process TREC files in Python.
https://pypi.python.org/pypi/trectools
BSD 3-Clause "New" or "Revised" License
163 stars 32 forks source link

Extend TrecTopics - query inside topic tag #30

Closed Tekaichi closed 2 years ago

Tekaichi commented 2 years ago

In this year edition o f [TREC-CT] (http://www.trec-cds.org/2021.html) the topics are formatted as follows:

<topics task="2021 TREC Clinical Trials">
  <topic number="-1">
    A 2-year-old boy is brought to the emergency department by his parents for 5 days of high fever
    and irritability. The physical exam reveals conjunctivitis, strawberry tongue, inflammation of
    the hands and feet, desquamation of the skin of the fingers and toes, and cervical
    lymphadenopathy with the smallest node at 1.5 cm. The abdominal exam demonstrates tenderness
    and enlarged liver. Laboratory tests report elevated alanine aminotransferase, white blood cell
    count of 17,580/mm, albumin 2.1 g/dL, C-reactive protein 4.5 mg, erythrocyte sedimentation rate
    60 mm/h, mild normochromic, normocytic anemia, and leukocytes in urine of 20/mL with no bacteria
    identified. The echocardiogram shows moderate dilation of the coronary arteries with possible
    coronary artery aneurysm.
  </topic>
</topics>

Thus it is not compatible with the current implementation of trec_topics.py since it looks for a query_tag within the topic tag, which in this case does not exist. Here I suggest that querytext_tag="query is initalized as None, and if None, the following line query = topic.findNext(querytext_tag).getText() turns intoquery = topic.getText()

I can handle and send a PR if my suggestion seems like a good solution.

joaopalotti commented 2 years ago

Hi @Tekaichi, that looks good! Thanks for your contribution!