Our Entity-Relationship Model is the blueprint of our intended single-page web
application based on the data extracted from DBLP, the computer science
bibliography web database, that stores every computer science related academic
paper. The original dataset can be downloaded from
dblp, and
the data are stored in the xml file along with a dtd file which describes
the entity and attributes of the xml nodes. Since xml is a tree-structured
markup language, and every node is a subtype of DBLP Element, we have to
re-organize the data to fit the relational database and meet our application
requirements. For example, authors is originally an attribute of all xml
node, and may also have multiple values. After the modification, we put
author as an important entity which is derived from person entity. We
also add co-author and write relationship for further query requirements.
Another important relationship that we want to emphasize is the collection
relationship between articles and journal and so on. In original xml,
there is no journal entity, which is that journal is a derived attribute
of article. We can only locate an article in a journal based on the
volume and page. Thus, after we create the journal entity, we can directly
find all articles collected in the same journal more conveniently.
Data Reorganization
Following the same method presented above, the conceptual design of our database finally has eight native entities and seven relationships among the entities.
We list several detailed descriptions of how we generalize and abstract these entities of interest below: (Notice that some attributes are omitted in the adapted version because of low priority. We may consider to keep them for data integrity or discard them for storage economy.)
The general logic of flow is that in our database we store collections that
are published by one or more individuals (persons).
Each collection has a volume, book-title and publisher.
Journal, proceedings, and book are under the hierarchy of collection. All of them are considered a sub-entity of collection.
Each of the three sub-entities consists of various academic papers. In details, each journal collects one or more articles. Each proceeding collects one or more inproceedings, and each book collects one or more incollections.
Journal, proceedings, books, articles, inproceedings and incollections are all under the generalization called Publications. For example, a single article may be published by author A, and the article is collected by a journal that’s published in a later time.
Each publication has an url, a range of pages specified, a title, a published year, an electronic edition and a unique id.
Publications can be sited by other publications.
A person can be an author or an editor.
A person has an affiliation, a homepage, one or more names (abbreviation), and a unique person ID.
Authors write publications and can be co-authors with other authors.
Editors edit various publications.
User Interface Design
Page Connection
Feature & Usage:
Enter a keyword in the search bar, to find all author and paper related to the keyword.
Enter starting year and end year to limit the period of search result (Refer to deliverable 1, example query 1)
Find all article cited the current paper on "info table". (Refer to deliverable 1, example query 2)
Find all coauthor that has collaborated with the selected author on the author page, navigated by clicking the author card. (Refer to deliverable 1, example query 3)
The checkbox on the right side will be displayed in sorted order. For example, in the conference part, the top most option will be the conference with the greatest amount of paper with the searched keyword in their title.(Refer to deliverable 1, example query 5)
The user can choose the order of paper which be listed in the advanced search. For example, If the user chooses the sort order of most cited, then the list paper will be most cited papers related to the current search keyword. (Refer to deliverable 1, example query 4)
The chart on a paper panel will tell the trend of popularity of the paper. (Refer to deliverable 1 example query 6)
Empty is allowed for the search bar. For example, the user can find most active authors by leaving search bar empty and choose "amount of publication (descending)" listing order.(Refer to deliverable 1, example 7)
Depend on the available data, the panel may change its shape. For example, if the element found is a book, ISBN number may be included, but the chart may be excluded since citation data is not available for books.
Potential Challenge
The amount of result could be huge, therefore it is necessary to receive
partial result.
The amount of author could be huge, therefore it may not be a good idea to
display list of authors. A possible alternative could be let user type the
author name they wish to include or exclude.
Conceptual Database Design
Original Dataset
Our Entity-Relationship Model is the blueprint of our intended single-page web application based on the data extracted from DBLP, the computer science bibliography web database, that stores every computer science related academic paper. The original dataset can be downloaded from dblp, and the data are stored in the
xml
file along with adtd
file which describes the entity and attributes of thexml
nodes. Sincexml
is a tree-structured markup language, and every node is a subtype ofDBLP Element
, we have to re-organize the data to fit the relational database and meet our application requirements. For example, authors is originally an attribute of allxml
node, and may also have multiple values. After the modification, we put author as an important entity which is derived from person entity. We also add co-author and write relationship for further query requirements. Another important relationship that we want to emphasize is the collection relationship between articles and journal and so on. In originalxml
, there is no journal entity, which is that journal is a derived attribute of article. We can only locate an article in a journal based on the volume and page. Thus, after we create the journal entity, we can directly find all articles collected in the same journal more conveniently.Data Reorganization
Following the same method presented above, the conceptual design of our database finally has eight native entities and seven relationships among the entities. We list several detailed descriptions of how we generalize and abstract these entities of interest below: (Notice that some attributes are omitted in the adapted version because of low priority. We may consider to keep them for data integrity or discard them for storage economy.)
User Interface Design
Page Connection
Feature & Usage:
Potential Challenge
Component dependency:
deliverable2.pdf