This merge requests adds a pipeline feature to Nexus so that multiple IDataSources can be chained. This solves two problems:
It is possible to alter catalogs and resources, i.e. with the parallel developed data source Nexus.Sources.Transform users of Nexus can now rename resources, derive units from channel names, assign default groups, etc.
Catalogs can be augmented with additional resources, i.e. two or more data sources can now be responsible for a single catalog. This is useful for plugins that derive data from raw data and where the derived data (i.e. new resources) should be located in the same catalog. This is also useful for cases where data is located in the same folder structure but different file formats. Normally data from all file types in the same folder belong together but will be handled by different plugins / data sources.
To distinguish which data source should handle which data requests, every resource gets an integer property assigned under the path nexus/pipline-position:
This position is set by Nexus when the individual data sources return their resource catalogs. It is then later used to distribute ReadRequests to the corresponding data sources.
At the same time this additional piece of metadata is useful in making the data processing pipeline more tracable so that users can always find out which version of a software and which configuration led to the specific set of data. In future we should add Git support and create a commit every time the configuration changes. The current commit ID will then become part of the catalog metadata (#119).
A frequent change to Nexus source code is the renaming of DataSourceRegistration to Pipeline. This was necessary because we now do not only have a single DataSourceRegistration to provide a set of catalogs but multiple DataSourceRegistrations which compose a pipeline.
There are also many changes regarding to the "format on save" feature, i.e. often useless spaces have been removed. Or I have reformatted some individual LOC without changing their meaning.
Here are some comments to the individually changed files:
.github/workflows:
Use specific pyright version because the new one causes type checking errors (however, this means we need to solve the python type errors in near future, #124)
.vscode/settings.json
Exclude .razor files from editor.formatOnSave because this produces incorrect files
notes/plugin-pipeline.excalidraw
A drawing which shows the pipeline feature, can be ignored
Since we now have a list of data sources (the pipeline), the data source info page has been adapted to display data per data source
src/Nexus.UI/Core/AppState.cs
The CatalogInfo type (contains display info for the UI) had to be adapted so that info about all pipeline members (data sources) can be provided to the UI
mainly renaming from DataSourceRegistration to Pipeline
line 293 (old) / 301 (new): I made the extension method JsonElement.GetStringValue a bit more efficient by reducing the number of string.Split operations which means that now the first parameter is an array instead of a path-like string. This change will occur in other files as well
src/Nexus/API/SourcesController.cs
Previously the user-specific DataSourceRegistration configuration was part of the project.json file in the Nexus configuration folder. This has been factored out and is now part of the user specific folders (also in the Nexus configuration folder):
The file pipelines.json contains all user-configured pipelines and the pipelines itself are managed by the newly created service PipelineService and the file system interaction is handled by the already existing DatabaseService which are both injected into this file (src/Nexus/API/SourcesController.cs).
The REST API code in this file has been adapted to let users interact with pipelines instead of data source registrations.
src/Nexus/API/UsersController.cs
The type InternalDataSourceRegistration became superfluous and has been removed. Now DataSourceRegistration is used everywhere instead
src/Nexus/Core/CatalogContainer.cs
This file mainly follows the name changes and the fact that we now have to handle arrays instead of single object DataSourceRegistrations
src/Nexus/Core/Models_NonPublic.cs
As described above, previously the DataSourceRegistrations were part of project.json which the type UserConfiguration belonged to. Now that DataSourceRegistrations are living in their own pipeline.json files, the type UserConfiguration is not required anymore
src/Nexus/Core/Models_Public.cs
see - same as before comments above
There is now a DataSourcePipeline type which is similar to the old DataSourceRegistration type except that now we have a list of DataSourceRegistrations
This is the core of the changes: Here Nexus has to handle the new pipeline approach, i.e. answers to questions like What to do with multiple GetTimeRange() return values? (because we now have multiple data sources), and more.
The solution for multiple GetAvailability() responses is to calculate the average
The old extension method GetCatalogAsync has been renamed to EnrichCatalogAsync because now every data source gets the catalog returned by the data source which is located earlier in the pipeline. The first data source gets an empty catalog.
Data sources get only read requests passed for resources which belong to the current pipeline position
src/Nexus/Extensions/Sources/Sample.cs
mainly just adapt to other code changes
Line 150 (old) / 149 (new): there now a new tuple parameter called originalResourceName in method ReadAsync. This one became necessary because with the pipeline approach resource IDs can be modified by data source which come later in the pipeline. So the data source which originally provided a resource with a specific ID (= name) cannot rely anymore on the resource ID in the ReadAsync method. Therefore Nexus ensures that every resource has an orignal-name property:
This property can be deliberately set by a data source or - in case the data source doesn't do this - Nexus will do it for you so that this value is never null. So the originalResourceName will now always be part of a ReadRequest.
src/Nexus/Extensions/Writers/Csv.cs
follow previous code changes
src/Nexus/Program.cs
register the PipelineService for DI
src/Nexus/Services/AppStateManager.cs
Data source registrations are now managed by PipelineService, so remove the unnecessary code from here
src/Nexus/Services/CatalogManager.cs
follow previous code changes
src/Nexus/Services/DataControllerService.cs
follow previous code changes
src/Nexus/Services/DataService.cs
follow previous code changes
src/Nexus/Services/DatabaseService.cs
Extend this service with functionality to handle pipeline data
src/Nexus/Services/PipelineService.cs
The pipeline service (handles creation, deletion and retrieval of pipelines per user)
Since Nexus now actively relies on the presence of the original-name resource property, there is a helper method to create it. This already existed in the project Nexus.Sources.StructuredFile but has been moved over into this project
All code which ensures the presence of mandatory catalog and resource properties has been moved over here toa central place
The catalog properties now look a bit different. This is to make the .json object a bit more compact
As mentioned before, the number of string.Split() operations has been reduced to make property access more efficient. Internally catalog and resource properties are represented by a JsonElement and unfortunately it is a bit of work to access nested JSON data. That is the reason why this class exists.
this unit test fixture prepares test data, i.e. it prepares data source registrations (now two instead of one because we want to test the new pipeline behavior)
A data source to be used in the tests and which modifies existing resources and adds a new resource to the catalog. This data source is placed in pipeline position 1, i.e. after the actual data source
This merge requests adds a pipeline feature to Nexus so that multiple
IDataSource
s can be chained. This solves two problems:To distinguish which data source should handle which data requests, every resource gets an integer property assigned under the path
nexus/pipline-position
:This position is set by Nexus when the individual data sources return their resource catalogs. It is then later used to distribute
ReadRequest
s to the corresponding data sources.At the same time this additional piece of metadata is useful in making the data processing pipeline more tracable so that users can always find out which version of a software and which configuration led to the specific set of data. In future we should add Git support and create a commit every time the configuration changes. The current commit ID will then become part of the catalog metadata (#119).
A frequent change to Nexus source code is the renaming of
DataSourceRegistration
toPipeline
. This was necessary because we now do not only have a singleDataSourceRegistration
to provide a set of catalogs but multipleDataSourceRegistration
s which compose a pipeline.There are also many changes regarding to the "format on save" feature, i.e. often useless spaces have been removed. Or I have reformatted some individual LOC without changing their meaning.
Here are some comments to the individually changed files:
.github/workflows
:.vscode/settings.json
.razor
files fromeditor.formatOnSave
because this produces incorrect filesnotes/plugin-pipeline.excalidraw
openapi.json
src/Nexus.UI/Components/CatalogAboutView.razor
src/Nexus.UI/Core/AppState.cs
CatalogInfo
type (contains display info for the UI) had to be adapted so that info about all pipeline members (data sources) can be provided to the UIsrc/Nexus.UI/Core/NexusDemoClient.cs
src/Nexus.UI/ViewModels/FakeResourceCatalogViewModel.cs
src/Nexus/API/CatalogsController.cs
DataSourceRegistration
toPipeline
JsonElement.GetStringValue
a bit more efficient by reducing the number of string.Split operations which means that now the first parameter is an array instead of a path-like string. This change will occur in other files as wellsrc/Nexus/API/SourcesController.cs
DataSourceRegistration
configuration was part of theproject.json
file in the Nexus configuration folder. This has been factored out and is now part of the user specific folders (also in the Nexus configuration folder):The file
pipelines.json
contains all user-configured pipelines and the pipelines itself are managed by the newly created servicePipelineService
and the file system interaction is handled by the already existingDatabaseService
which are both injected into this file (src/Nexus/API/SourcesController.cs
).The REST API code in this file has been adapted to let users interact with pipelines instead of data source registrations.
src/Nexus/API/UsersController.cs
InternalDataSourceRegistration
became superfluous and has been removed. NowDataSourceRegistration
is used everywhere insteadsrc/Nexus/Core/CatalogContainer.cs
DataSourceRegistration
ssrc/Nexus/Core/Models_NonPublic.cs
DataSourceRegistration
s were part ofproject.json
which the typeUserConfiguration
belonged to. Now thatDataSourceRegistration
s are living in their ownpipeline.json
files, the typeUserConfiguration
is not required anymoresrc/Nexus/Core/Models_Public.cs
- same as before
comments aboveDataSourcePipeline
type which is similar to the oldDataSourceRegistration
type except that now we have a list ofDataSourceRegistration
ssrc/Nexus/Extensibility/DataSource/DataSourceController.cs
What to do with multiple GetTimeRange() return values?
(because we now have multiple data sources), and more.GetCatalogAsync
has been renamed toEnrichCatalogAsync
because now every data source gets the catalog returned by the data source which is located earlier in the pipeline. The first data source gets an empty catalog.src/Nexus/Extensions/Sources/Sample.cs
originalResourceName
in methodReadAsync
. This one became necessary because with the pipeline approach resource IDs can be modified by data source which come later in the pipeline. So the data source which originally provided a resource with a specific ID (= name) cannot rely anymore on the resource ID in theReadAsync
method. Therefore Nexus ensures that every resource has anorignal-name
property:This property can be deliberately set by a data source or - in case the data source doesn't do this - Nexus will do it for you so that this value is never
null
. So theoriginalResourceName
will now always be part of aReadRequest
.src/Nexus/Extensions/Writers/Csv.cs
src/Nexus/Program.cs
PipelineService
for DIsrc/Nexus/Services/AppStateManager.cs
PipelineService
, so remove the unnecessary code from heresrc/Nexus/Services/CatalogManager.cs
src/Nexus/Services/DataControllerService.cs
src/Nexus/Services/DataService.cs
src/Nexus/Services/DatabaseService.cs
src/Nexus/Services/PipelineService.cs
src/Nexus/wwwroot/css/app.css
src/clients/dotnet-client/NexusClient.g.cs
src/clients/python-client/nexus_api/_nexus_api.py
src/extensibility/dotnet-extensibility/DataModel/DataModelExtensions.cs
original-name
resource property, there is a helper method to create it. This already existed in the projectNexus.Sources.StructuredFile
but has been moved over into this projectnew:![grafik](https://github.com/user-attachments/assets/dda85709-c806-4d16-85ad-9b31fad9755a)
old![grafik](https://github.com/user-attachments/assets/e87383a0-8ed9-4589-8beb-a4fc188a72f7)
src/extensibility/dotnet-extensibility/DataModel/PropertiesExtensions.cs
string.Split()
operations has been reduced to make property access more efficient. Internally catalog and resource properties are represented by aJsonElement
and unfortunately it is a bit of work to access nested JSON data. That is the reason why this class exists.src/extensibility/dotnet-extensibility/DataModel/ResourceCatalog.cs
src/extensibility/dotnet-extensibility/Extensibility/DataSource/DataSourceTypes.cs
src/extensibility/dotnet-extensibility/Extensibility/DataSource/IDataSource.cs
GetCatalogAsync
has been renamed toEnrichCatalogAsync
and the parameters changedsrc/extensibility/python-extensibility/nexus_extensibility/_extensibility_data_source.py
tests/Nexus.Tests/DataSource/DataSourceControllerFixture.cs
tests/Nexus.Tests/DataSource/DataSourceControllerTests.cs
tests/Nexus.Tests/DataSource/SampleDataSourceTests.cs
tests/Nexus.Tests/DataSource/TestSource.cs
1
, i.e. after the actual data sourcetests/Nexus.Tests/Other/CatalogContainersExtensionsTests.cs
tests/Nexus.Tests/Other/PackageControllerTests.cs
tests/Nexus.Tests/Services/CatalogManagerTests.cs
tests/Nexus.Tests/Services/DataControllerServiceTests.cs
tests/Nexus.Tests/Services/DataServiceTests.cs
tests/Nexus.Tests/Services/PipelineServiceTests.cs
PipelineService
tests/Nexus.Tests/Services/TokenServiceTests.cs
tests/TestExtensionProject/TestDataSource.cs