This pull request adapts the dev.py sub-module to ensure full compatibility with the 3W Dataset 2.0. The main changes include updating the EventFolds class to correctly handle the new data loading process and removing the redundant extrai_arrays() function.
Changes made:
Removed extrai_arrays() function: This function was previously used to extract data from individual CSV files. With the new load_3w_dataset() function in base.py, which loads the entire dataset into a Pandas DataFrame, the extrai_arrays() function became redundant and was removed.
Updated EventFolds class:
The __init__() method was modified to receive the complete DataFrame as a parameter instead of individual instance names. This change streamlines the data loading process and improves efficiency.
The carregue_instancia() method was updated to use the load_3w_dataset() function for loading data, ensuring consistency and compatibility with the new data structure.
The logic for extracting training and test samples was adjusted to work with the DataFrame structure.
Updated Experiment class: The folds() method was adjusted to pass the DataFrame to the EventFolds class, ensuring the correct data flow.
Example usage:
The following code snippet demonstrates how to use the updated Experiment class with the 3W Dataset 2.0:
import toolkit as tk
# Create an experiment for the "SPURIOUS_CLOSURE_OF_DHSV" event
experiment = tk.Experiment(event_name="SPURIOUS_CLOSURE_OF_DHSV")
# Generate the folds for the experiment
folds = experiment.folds()
# Access the training and test samples for each fold
for fold in folds:
X_train, y_train = fold.extract_training_samples()
X_test = fold.extract_test_samples()
# ... your machine learning model training and evaluation code here ...
Benefits:
Compatibility with 3W Dataset 2.0: Ensures seamless integration with the latest version of the dataset.
Improved efficiency: Removes redundant code and optimizes data loading.
Simplified workflow: Streamlines the process of accessing and preparing data for machine learning experiments.
Enhanced maintainability: Improves code readability and maintainability by removing unnecessary complexity.
This contribution significantly improves the usability and efficiency of the 3W Toolkit when working with the 3W Dataset 2.0, facilitating research and development of machine learning models for anomaly detection in oil wells.
By creating this pull request, I confirm that I have read and fully accept and agree with one of the Petrobras' Contributor License Agreements (CLAs):
This pull request adapts the
dev.py
sub-module to ensure full compatibility with the 3W Dataset 2.0. The main changes include updating theEventFolds
class to correctly handle the new data loading process and removing the redundantextrai_arrays()
function.Changes made:
extrai_arrays()
function: This function was previously used to extract data from individual CSV files. With the newload_3w_dataset()
function inbase.py
, which loads the entire dataset into a Pandas DataFrame, theextrai_arrays()
function became redundant and was removed.EventFolds
class:__init__()
method was modified to receive the complete DataFrame as a parameter instead of individual instance names. This change streamlines the data loading process and improves efficiency.carregue_instancia()
method was updated to use theload_3w_dataset()
function for loading data, ensuring consistency and compatibility with the new data structure.Experiment
class: Thefolds()
method was adjusted to pass the DataFrame to theEventFolds
class, ensuring the correct data flow.Example usage:
The following code snippet demonstrates how to use the updated Experiment class with the 3W Dataset 2.0:
Benefits:
This contribution significantly improves the usability and efficiency of the 3W Toolkit when working with the 3W Dataset 2.0, facilitating research and development of machine learning models for anomaly detection in oil wells.
By creating this pull request, I confirm that I have read and fully accept and agree with one of the Petrobras' Contributor License Agreements (CLAs):
Our CLAs are based on the Apache Software Foundation's CLAs: