jeffhj / domain-relevance

The implementation for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)
Apache License 2.0
15 stars 1 forks source link

section 4.1 what do you mean "Each broad domain and its sub-domains share seed terms" #2

Open lihuiliullh opened 2 years ago

lihuiliullh commented 2 years ago

Suppose there are two seed terms "A" and "B", given the hierarchy CS -> AI -> ML. Do you mean "A" and "B" connect to "CS" "AI' and "ML" at the same time?

image

jeffhj commented 2 years ago

It is not "A" and "B" connect to "CS" "AI' and "ML" at the same time, but in our setting, CS/AI/ML shares the same set of seed terms (because we use the same domain corpus to extract the terms to initialize the graph). You may also refer to the statistics of data in Table 1, where #terms is the number of "seed terms".

In this paper, we use "seed terms" to refer to a large set of terms extracted from the corpus, which may be different from the concept in graph mining, e.g., a small number of seed nodes to initialize the algorithm.

lihuiliullh commented 2 years ago

Thanks. I have another question. In section 3.3, each core term has one or several categories, and all these categories can form a category tree. Does this mean all the core terms from the same large area, like "computer science" or "Physics"? In the picture blew, "for a given domain", does the "domain" here mean the small domain in the large area?

For example, in "computer science" dataset, the root of the tree is "subfields of computer science", if I want to find information about "deep learning", e.g., given domain "deep learning", then in the sentence "For a given domain, we can first traverse from a root category and collect some gold subcategories.", the root category here means "subfields of computer science" or "deep learning"?

image

jeffhj commented 2 years ago

Yes. Your understanding is correct. The root category here means "Category:Subfields of computer science" (https://en.wikipedia.org/wiki/Category:Subfields_of_computer_science) or "Category:Machine learning" (https://en.wikipedia.org/wiki/Category:Machine_learning)

lihuiliullh commented 2 years ago

May I know the label here is a boolean value ( e.g., 1 means related and 0 means unrelated) or a scalar (e.g., 0.8, 0.7)? If the domain changes, does all the terms need to be relabeled again?

Also, in the paragraph above Equation (7), "all the core terms are labeled at each level of the hierarchy", can a term belong to several different hierarchy at the same time?

image

jeffhj commented 2 years ago

Yes. The label here is a boolean value. Yes. All the terms need to be relabeled for a new domain (In our paper, we introduce an automatic approach to do this) Yes. A term can belong to several different hierarchies at the same time.