Refactor connection to the backends and introduce separate classes fo…

vepadulano commented 3 years ago

…r each one

This first commit introduces the changes needed to remove the old logic of a global backend running the analysis (activated through PyRDF.use) Instead now the single dataframe instance can be connected to the backend directly. To this end a new class (PyRDF.DataFrame.SparkDataFrame) is introduced as the first one in the list of factory classes that will allow the new logic to work. Also a factory class called DataFrameFactory has been created, allowing to dispatch the creation of a dataframe to the correct PyRDF.backend.Dist subclass. For now it mostly serves as a helper in the logic for the distributed snapshot. Other global variables have been removed, such as include_headers and include_shared_libs. These sets of paths now are tied to the instance of the backend.

Lot of tests have been modified according to the changes, many removed. This PR takes for granted the changes introduced by #109 , so it's best to merge that one before this one.

vepadulano commented 3 years ago

Latest commits implements the discussed comments.

vepadulano commented 3 years ago

Merging :smile:

vepadulano commented 3 years ago

Fixed an issue with Python2, now merging

vepadulano / PyRDF

Refactor connection to the backends and introduce separate classes fo… #110