xpmethod / opensyllabus

Other
48 stars 10 forks source link

Extract or Identify the Department of Course Syllabus #29

Open grahamsack opened 10 years ago

grahamsack commented 10 years ago

Note: This is one of several issues related to basic information retrieval from the syllabi. We are assuming in all cases that the extraction is from a .txt document.

Task: Given a syllabus in .txt format, identify and extract the department of the course. This involves two possible approaches:

1) If the department is actually stated in the syllabus, extract it 2) If the department is not stated in the syllabus, use machine learning approaches (such as Naive Bayes, TDIDF, or Topic Modeling) to determine the likely department based on the text of the syllabus (e.g., the course description, the books mentioned, etc.).

jon-freed commented 10 years ago

Why not both approaches? And then the two can be compared.