This repository provides detailed reports on Semifield image data, including data contents, species distribution, temporal and spatial distribution, missing data analysis, and status of unprocessed or backlog data.
To manage the project's dependencies efficiently, we use Conda, a powerful package manager and environment manager. Follow these steps to install Conda if you haven't already:
conda list
to ensure Conda was installed correctly. You should see a list of installed packages.After installing Conda, you can set up an environment for this project using an environment file, which specifies all necessary dependencies. Here's how:
Clone this repository to your local machine.
Navigate to the repository directory in your terminal.
Locate the environment.yaml
file in the repository. This file contains the list of packages needed for the project.
Create a new Conda environment by running the following command:
conda env create -f environment.yaml
This command reads the environment.yaml
file and creates an environment with the name and dependencies specified within it.
Once the environment is created, activate it with:
conda activate <env_name>
Replace <env_name>
with the name of the environment specified in the environment.yaml
file.
With the environment set up and activated, you can run the scripts provided in the repository to begin data exploration and analysis:
conda activate semifield-reports
python main.py task=<task_name>
export_blob_metrics.py
ExporterBlobMetrics: Exports blob metrics by running AzCopy commands and saving the output to text files.
CalculatorBlobMetrics: Calculates and analyzes blob metrics from the exported text files, including extracting batch details, filtering data, and computing image counts.
Run the script with the configuration file as an argument:
python main.py task=export_blob_metrics
Text Files: The ExporterBlobMetrics class saves blob lists as text files. The text files are saved in the directory specified by cfg.paths.data_dir
in the configuration file, with the nameing format <blob_container_name>.txt
.
CSV Report: The CalculatorBlobMetrics class generates a CSV file containing mismatch statistics, detailing any discrepancies found during analysis. The CSV files are saved in the directory specified by cfg.paths.data_dir
in the configuration file, with the nameing format mismatch_statistics_record.csv
.
report_blob_metrics.py
Run the script with the configuration file as an argument:
python main.py task=report_blob_metrics
cfg.paths.report
in the configuration file, with the naming format semifield-developed-images_image_counts_and_averages_report.pdf
.