Deliverable 02 - Githubissues

Conceptual Database Design

Original Dataset

The original structure of `dblp.xml` file

Our Entity-Relationship Model is the blueprint of our intended single-page web application based on the data extracted from DBLP, the computer science bibliography web database, that stores every computer science related academic paper. The original dataset can be downloaded from dblp, and the data are stored in the xml file along with a dtd file which describes the entity and attributes of the xml nodes. Since xml is a tree-structured markup language, and every node is a subtype of DBLP Element, we have to re-organize the data to fit the relational database and meet our application requirements. For example, authors is originally an attribute of all xml node, and may also have multiple values. After the modification, we put author as an important entity which is derived from person entity. We also add co-author and write relationship for further query requirements. Another important relationship that we want to emphasize is the collection relationship between articles and journal and so on. In original xml, there is no journal entity, which is that journal is a derived attribute of article. We can only locate an article in a journal based on the volume and page. Thus, after we create the journal entity, we can directly find all articles collected in the same journal more conveniently.

Data Reorganization

Following the same method presented above, the conceptual design of our database finally has eight native entities and seven relationships among the entities. We list several detailed descriptions of how we generalize and abstract these entities of interest below: (Notice that some attributes are omitted in the adapted version because of low priority. We may consider to keep them for data integrity or discard them for storage economy.)

The general logic of flow is that in our database we store collections that are published by one or more individuals (persons).
Each collection has a volume, book-title and publisher.
Journal, proceedings, and book are under the hierarchy of collection. All of them are considered a sub-entity of collection.
Each of the three sub-entities consists of various academic papers. In details, each journal collects one or more articles. Each proceeding collects one or more inproceedings, and each book collects one or more incollections.
Journal, proceedings, books, articles, inproceedings and incollections are all under the generalization called Publications. For example, a single article may be published by author A, and the article is collected by a journal that’s published in a later time.
Each publication has an url, a range of pages specified, a title, a published year, an electronic edition and a unique id.
Publications can be sited by other publications.
A person can be an author or an editor.
A person has an affiliation, a homepage, one or more names (abbreviation), and a unique person ID.
Authors write publications and can be co-authors with other authors.
Editors edit various publications.

Entity-relation diagram after the adaption

User Interface Design

Page Connection

UI prototype diagram of our front-end

Feature & Usage:

Enter a keyword in the search bar, to find all author and paper related to the keyword.
Enter starting year and end year to limit the period of search result (Refer to deliverable 1, example query 1)
Find all article cited the current paper on "info table". (Refer to deliverable 1, example query 2)
Find all coauthor that has collaborated with the selected author on the author page, navigated by clicking the author card. (Refer to deliverable 1, example query 3)
The checkbox on the right side will be displayed in sorted order. For example, in the conference part, the top most option will be the conference with the greatest amount of paper with the searched keyword in their title.(Refer to deliverable 1, example query 5)
The user can choose the order of paper which be listed in the advanced search. For example, If the user chooses the sort order of most cited, then the list paper will be most cited papers related to the current search keyword. (Refer to deliverable 1, example query 4)
The chart on a paper panel will tell the trend of popularity of the paper. (Refer to deliverable 1 example query 6)
Empty is allowed for the search bar. For example, the user can find most active authors by leaving search bar empty and choose "amount of publication (descending)" listing order.(Refer to deliverable 1, example 7)
Depend on the available data, the panel may change its shape. For example, if the element found is a book, ISBN number may be included, but the chart may be excluded since citation data is not available for books.

Potential Challenge

The amount of result could be huge, therefore it is necessary to receive partial result.
The amount of author could be huge, therefore it may not be a good idea to display list of authors. A possible alternative could be let user type the author name they wish to include or exclude.

Component dependency:

Header
- Search-bar
- Query manager
Landing Page
Lookup Page
- Results
- Result
  - Inproceeding
  - Citation Chart
  - Website
  - Proceeding
  - Incollection
  - Book
  - Article
- Controls
- AuthorList
- ResultTypeList
- Query manager
Author Page
- Results
- Result
  - Inproceeding
  - Citation Chart
  - Website
  - Proceeding
  - Incollection
  - Book
  - Article
Advanced Search page
- Restriction
- Query manager
Sign-in
Sign-up

deliverable2.pdf

lvergergsk / BibGallery-FrontEnd

Deliverable 02 #14