This first commit introduces the changes needed to remove the old logic of a global backend running the analysis (activated through PyRDF.use)
Instead now the single dataframe instance can be connected to the backend directly. To this end a new class (PyRDF.DataFrame.SparkDataFrame) is introduced as the first one in the list of factory classes that will allow the new logic to work.
Also a factory class called DataFrameFactory has been created, allowing to dispatch the creation of a dataframe to the correct PyRDF.backend.Dist subclass. For now it mostly serves as a helper in the logic for the distributed snapshot.
Other global variables have been removed, such as include_headers and include_shared_libs. These sets of paths now are tied to the instance of the backend.
Lot of tests have been modified according to the changes, many removed. This PR takes for granted the changes introduced by #109 , so it's best to merge that one before this one.
…r each one
This first commit introduces the changes needed to remove the old logic of a global backend running the analysis (activated through PyRDF.use) Instead now the single dataframe instance can be connected to the backend directly. To this end a new class (PyRDF.DataFrame.SparkDataFrame) is introduced as the first one in the list of factory classes that will allow the new logic to work. Also a factory class called DataFrameFactory has been created, allowing to dispatch the creation of a dataframe to the correct PyRDF.backend.Dist subclass. For now it mostly serves as a helper in the logic for the distributed snapshot. Other global variables have been removed, such as include_headers and include_shared_libs. These sets of paths now are tied to the instance of the backend.
Lot of tests have been modified according to the changes, many removed. This PR takes for granted the changes introduced by #109 , so it's best to merge that one before this one.