pmsoltani / elsametric

0 stars 0 forks source link

Move unrelated files to another repo #27

Closed pmsoltani closed 4 years ago

pmsoltani commented 4 years ago

Introduction

After #22, elsametric should be used as a separate package and the current repo should only host files necessary for its development. To achieve this, the scope of the package elsametric must be determined and to do that, its mission has to be defined first.

What is elsametric?

It began as a collection of loose ideas about how to break down the raw academic publication data (such as data obtained from Scopus) in a database, which then could be queried to extract information.

elsametric has two main parts:

There are also some other parts as well:

These files should be moved to either the junkyard or the helper_scripts directories.

Additionally, the repo contains:

To create the API, two other files were created: main.py (API routes) & api_queries.py, which includes process functions for the API to work.

What should elsametric do (and what should it not)

elsametric is about designing and maintaining an efficient database to store academic publications data. As such, it should be consisted of:

  1. the elsametric folder, which includes the SQLAlchemy model and some helper functions to process data
  2. the db_design folder, which holds a graphical version of the SQLAlchemy model, created using MySQL Workbench
  3. scripts to populate the database, which at the moment only includes db_populate.py
  4. scripts to gather data from the web, including:
    • scripts for getting publications data from servers such as Scopus & WOS
    • crawlers for getting the profile of the faculty members
    • crawlers for getting author metrics (such as h-index from google scholar)

Of the items mentioned above, only the elsametric directory will be installed using pip. Other scripts and files reside solely in the repo. Future releases might install them along with the elsametric folder.

Other functionality regarding the growth and maintenance of the database can be included in the future. For example, the CSV-processing functions in the shcopus repo which can analyze CSV export from Scopus can be added to this repo, in case of Scopus API limitations.

Yet other functionality might include ways of migrating the database, probably using SQLAlchemy's Alembic. These tools will enable the package to avoid re-populating the entire database, every time a change in the structure is needed.

elsametric is not about creating and maintaining a webserver or an API. That should be the job of another repo. Hence, the files main.py and api_queries.py are to be moved out of this repository.

Any remaining script, whether Python or Jupyter Notebook, should be moved to the helper_scripts, and if they are not needed, to the junkyard directory. Eventually, the junkyard folder should be reviewed for any useful files and subsequently deleted from the repo... one should travel light!

pmsoltani commented 4 years ago

I'm reopening the issue, due to the fact that there are some scripts in the repo that need to be reviewed/deleted, such as Tehran.py & custom.ipynb.