A Linguine configuration file is a YAML file used to define the structure and behavior of data-oriented interfaces. These files specify the inputs, outputs, and various operations to be performed on the data. The configuration is hierarchical, allowing one interface to inherit from another.
A typical Linguine configuration file contains the following entries:
input
Specifies the input data for the interface. This can include mappings and columns.
dict
type
: The type of input (e.g., hstack
for horizontal stacking).index
: The index of the input data.mapping
: A dictionary mapping input fields to their corresponding values.specifications
: A list of specifications for the input data.output
Specifies the output data for the interface. This can include mappings and columns.
dict
type
: The type of output (e.g., hstack
for horizontal stacking).index
: The index of the output data.mapping
: A dictionary mapping output fields to their corresponding values.specifications
: A list of specifications for the output data.compute
Defines the computations to be performed on the data.
list
field
: The field to be computed.operation
: The operation to be performed.review
Specifies the fields to be reviewed.
list
field
: The field to be reviewed.operation
: The review operation.cache
Specifies the cache columns.
dict
columns
: A list of columns to be cached.constants
Defines constant values to be used in the interface.
dict
type
: The type of constants (e.g., hstack
for horizontal stacking).specifications
: A list of specifications for the constants.parameters
Specifies parameters for the interface.
dict
type
: The type of parameters (e.g., hstack
for horizontal stacking).specifications
: A list of specifications for the parameters.inherit
Specifies inheritance from another interface.
dict
directory
: The directory of the parent interface.filename
: The filename of the parent interface.ignore
: A list of keys to ignore from the parent interface.append
: A list of keysThe "access" phase in the Linguine framework is about obtaining an electronic version of the data. The io.py
file provides the capability to access data in various formats. These formats can be specified in the Linguine interface file. Below are the details on how to specify different data formats in the interface file.
Linguine supports various data formats, each requiring specific information to be accessed. The following sections describe the required entries for each supported data format.
To access data from a CSV file, specify the following details:
type: csv
filename: path/to/file.csv
header: 0 # Optional, default is 0
delimiter: "," # Optional, default is ","
quotechar: '"' # Optional, default is '"'
dtypes: # Optional, specify data types for columns
- field: column_name
type: str_type
To access data from an Excel spreadsheet, specify the following details:
type: excel
filename: path/to/file.xlsx
sheet: Sheet1 # Optional, default is "Sheet1"
header: 0 # Optional, default is 0
dtypes: # Optional, specify data types for columns
- field: column_name
type: str_type
To access data from a JSON file, specify the following details:
type: json
filename: path/to/file.json
To access data from a YAML file, specify the following details:
type: yaml
filename: path/to/file.yaml
To access data from a Markdown file, specify the following details:
type: markdown
filename: path/to/file.md
To access data from a BibTeX file, specify the following details:
type: bibtex
filename: path/to/file.bib
To access data from a Google Sheet, specify the following details:
type: gsheet
filename: Google_Sheet_Name
sheet: Sheet1 # Optional, default is "Sheet1"
header: 0 # Optional, default is 0
dtypes: # Optional, specify data types for columns
- field: column_name
type: str_type
To access data from a directory of files, specify the following details:
type: directory
source:
- directory: path/to/directory
glob: "*.csv" # Optional, default is "*"
regexp: ".*" # Optional, regular expression to match filenames
store_fields: # Optional, specify fields to store in the data
directory: sourceDirectory
filename: sourceFile
root: sourceRoot
To access data directly from the details file, specify the following details:
type: local
data:
- column1: value1
column2: value2
- column1: value3
column2: value4
To access artificially generated data, specify the following details:
type: fake
nrows: 100 # Number of rows to generate
cols: # Specify columns and their types
column1: random_string
column2: random_integer
Here is an example configuration for accessing multiple data formats:
input:
type: hstack
specifications:
- type: csv
filename: data/file1.csv
header: 0
delimiter: ","
- type: excel
filename: data/file2.xlsx
sheet: Sheet1
- type: json
filename: data/file3.json
- type: yaml
filename: data/file4.yaml
- type: markdown
filename: data/file5.md
- type: bibtex
filename: data/file6.bib
- type: gsheet
filename: Google_Sheet_Name
sheet: Sheet1
- type: directory
source:
- directory: data/directory
glob: "*.csv"
store_fields:
directory: sourceDirectory
filename: sourceFile
root: sourceRoot
- type: local
data:
- column1: value1
column2: value2
- column1: value3
column2: value4
- type: fake
nrows: 100
cols:
column1: random_string
column2: random_integer
This configuration specifies how to access data from various sources, including CSV, Excel, JSON, YAML, Markdown, BibTeX, Google Sheets, directories, local data, and fake data. Each data format has its own set of required and optional entries to ensure the data is accessed correctly.
Linguine provides mechanisms to stack input
data either horizontally (hstack
) or vertically (vstack
). These operations allow you to combine multiple data sources into a single entity.
hstack
)Horizontal stacking (hstack
) combines data sources by aligning them side-by-side based on a common key or index. This is useful when you have different sets of columns for the same set of rows.
hstack
.index
.left
, right
, outer
, inner
). Default is left
._right
.type: hstack
specifications:
- type: csv
filename: data/file1.csv
on: id
how: left
- type: excel
filename: data/file2.xlsx
sheet: Sheet1
on: id
how: left
lsuffix: _left
rsuffix: _right
In this example, data from file1.csv
and file2.xlsx
are horizontally stacked based on the id
column.
vstack
)Vertical stacking (vstack
) combines data sources by aligning them one below the other. This is useful when you have the same set of columns for different sets of rows.
vstack
.False
.type: vstack
specifications:
- type: csv
filename: data/file1.csv
- type: excel
filename: data/file2.xlsx
sheet: Sheet1
reset_index: True
In this example, data from file1.csv
and file2.xlsx
are vertically stacked, and the index is reset after stacking.
You can combine hstack
and vstack
operations to create complex data structures. For example, you can first horizontally stack multiple data sources and then vertically stack the result with another data source.
type: vstack
specifications:
- type: hstack
specifications:
- type: csv
filename: data/file1.csv
on: id
how: left
- type: excel
filename: data/file2.xlsx
sheet: Sheet1
on: id
how: left
- type: csv
filename: data/file3.csv
reset_index: True
In this example, data from file1.csv
and file2.xlsx
are first horizontally stacked based on the id
column. The result is then vertically stacked with data from file3.csv
, and the index is reset after stacking.
The compute
field in a Linguine configuration file is used to define computations to be performed on the data. These computations can transform existing data, create new fields, or perform complex operations across multiple fields.
The compute
field is typically a list of computation specifications. Each computation is defined by a dictionary with the following key components:
compute:
- function: function_name
field: output_field_name
args:
arg1: value1
arg2: value2
refresh: boolean
False
.The args
field can contain different types of arguments:
Here are some examples of compute specifications:
compute:
- function: add
field: sum
args:
a:
column: column1
b:
column: column2
- function: format_name
field: full_name
args:
first_name:
row: first_name
last_name:
row: last_name
- function: calculate_average
field: average
args:
values:
subseries: data_column
- function: complex_calculation
field: result
args:
input1:
column: input_column1
input2:
column: input_column2
operation:
function: another_function
refresh: true
Compute operations are executed in the order they are defined in the configuration file. This allows for dependencies between computations, where one computation may rely on the results of a previous one.
The available functions for compute operations are defined in the _compute_functions_list
method of the Compute
class. This list includes the function name, the actual function object, default arguments, and an optional docstring.
If a specified function is not found in the function list, a ValueError
will be raised with an appropriate error message.
The review
section in a Linguine configuration file is used to specify how data should be displayed and interacted with for review purposes. This section is closely tied to the functionality provided by the DisplaySystem
class in display.py
.
The review
section typically contains a list of review specifications. Each specification is a dictionary that defines how a particular field or set of fields should be presented for review. Here's the general structure:
review:
- field: field_name
type: review_type
options:
option1: value1
option2: value2
Text Review
- field: description
type: text
options:
multiline: true
max_length: 500
Dropdown Review
- field: category
type: dropdown
options:
choices: ["A", "B", "C"]
allow_multiple: false
Checkbox Review
- field: is_active
type: checkbox
Date Review
- field: event_date
type: date
options:
format: "%Y-%m-%d"
Numeric Review
- field: score
type: numeric
options:
min: 0
max: 100
step: 0.1
Multiple Fields
- field: [first_name, last_name]
type: text
options:
label: "Full Name"
Conditional Display
- field: additional_info
type: text
options:
display_if:
field: has_additional_info
value: true
Custom Validation
- field: email
type: text
options:
validation: email_format
The review
entries are processed by the DisplaySystem
class in display.py
. This class creates appropriate widgets based on the review specifications and manages the interaction between the user interface and the underlying data.
Key methods in DisplaySystem
that handle review entries include:
populate_display()
: Creates and updates widgets based on review specifications.value_updated()
: Handles updates when a reviewed value changes.set_value()
and set_value_by_element()
: Update specific values in the data.Here's an example of a complete review
section in a Linguine configuration file:
review:
- field: title
type: text
options:
max_length: 100
label: "Article Title"
- field: content
type: text
options:
multiline: true
max_length: 1000
label: "Article Content"
- field: category
type: dropdown
options:
choices: ["News", "Opinion", "Feature"]
allow_multiple: false
label: "Article Category"
- field: tags
type: dropdown
options:
choices: ["Politics", "Technology", "Science", "Culture"]
allow_multiple: true
label: "Article Tags"
- field: publish_date
type: date
options:
format: "%Y-%m-%d"
label: "Publication Date"
- field: is_featured
type: checkbox
options:
label: "Feature this article?"
- field: [author_first_name, author_last_name]
type: text
options:
label: "Author Name"
This example demonstrates various types of review entries that can be used to create a comprehensive review interface for an article management system.