Closed by256 closed 4 months ago
Describe the error
Some of the CSV files in tables/ref_isolate_pairs have a typo in one of the column names.
tables/ref_isolate_pairs
Instead of control_iso_name, they have control_iso_nam.
control_iso_name
control_iso_nam
To locate the error
The error is present in the following files:
You can reproduce this with the following script:
from pathlib import Path import pandas as pd isolate_pairs_path = Path("covid-drdb-payload/tables/ref_isolate_pairs") for path in isolate_pairs_path.iterdir(): pairs_df = pd.read_csv(path) if set(pairs_df.columns) != {"ref_name", "control_iso_name", "iso_name"}: print(f"{path.stem}: {list(pairs_df.columns)}")
which should output:
uraki22-pair: ['ref_name', 'control_iso_nam', 'iso_name'] uriu22-pair: ['ref_name', 'control_iso_nam', 'iso_name'] uriu23-pair: ['ref_name', 'control_iso_nam', 'iso_name'] ueno22-pair: ['ref_name', 'control_iso_nam', 'iso_name'] uriu23b-pair: ['ref_name', 'control_iso_nam', 'iso_name'] uriu21-pair: ['ref_name', 'control_iso_nam', 'iso_name'] turner21-pair: ['ref_name', 'control_iso_nam', 'iso_name']
Expected behavior
The columns should be named control_iso_name instead of control_iso_nam.
@by256 I've using
for file in *.csv; do head -n 1 "$file" done | sort | uniq > unique_patterns.txt
to find and fix the files.
Describe the error
Some of the CSV files in
tables/ref_isolate_pairs
have a typo in one of the column names.Instead of
control_iso_name
, they havecontrol_iso_nam
.To locate the error
The error is present in the following files:
You can reproduce this with the following script:
which should output:
Expected behavior
The columns should be named
control_iso_name
instead ofcontrol_iso_nam
.