Author - Patrick Parkinson patrick.parkinson@manchester.ac.uk
License - This software is licensed under the MIT-license
Datasets associated with nano-electronics and nano-opto-electronics can be highly heterogeneous and large scale. This package provides tools to work with a specific back-end and manage point, spectral and imaging data.
BDNE is code to connect to and explore the big-data for nano-electronics dataset. It allows retrieval of measurements (results of experiments) for objects which are associated with unique entities for different samples. Each term is linked to a database table.
The database is designed to hold and store information for functional nanomaterials, and to allow analysis of this data. There is no specific schema for storage of data, which includes point measurement (such as length or width), one-dimensional measurements (such as spectra) two-dimensional measurements (including images) and structured results (for functional data).
The code is designed to run on Anaconda, but has been demonstrated to work on Google Colab with minimal effort.
To connect to the public BigQuery datasource, a credentials file is required. Contact Patrick Parkinson patrick.parkinson@manchester.ac.uk for details.
Four classes are implemented for higher-level functionality, within the data_structures.py
file.
Entity
- A container class for a single entity, and all associated measurements.EntityCollection
- A container class for a collection of entities.MeasurementCollection
- A container class for a collection of measurements.PostProcess
- A class to wrap MeasurementCollection or PostProcess collections with a processing function. These should be used for handling data and processing, and connect into the db_orm
backend.
Basic functionality can be produced using the example code provided in BDNE_example_code.py
.
A typical workflow might be:
from BDNE import connect_mysql
from BDNE import data_structures
# Connect to the database back-end
connect_mysql()
# Create an empty container
wc = data_structures.EntityCollection()
# Load data related to sample 25
wc.load_sample(25)
# It is possible to iterate over each wire,
# and retrieve a single measurement using the iterator approach
for wire in wc:
# Process this wire, i.e. spectra
s = wire.get("spectra")
# However, it is more efficient to collect all measurements into
# a Measurement Collection.
data = wc.get_measurement("l")
# You can iterate over the data, but this tends to be slow for large datasets
# as it requires many database calls
def process(da):
# Dummy processing function
return da
for d in data:
# Iterate over every l value
process(d)
# It is preferred to collect all of the data from the server, and then to process.
# This can lead to potentially large datasets of GB for imaging (single SQL call, get all data at once)
d = data.collect()
# The preferred method is to wrap the data with a PostProcess
p = data_structures.PostProcess(data)
# Set the processing function
p.set_function(process)
# Collect or iterate:
processed_data = p.collect()
This code is the product of input from researchers and students at the University of Manchester, with specific input from:
The underlying dataset contains materials created via contributions from researchers around the world .