Closed sbesson closed 2 years ago
Conflicting PR. Removed from build OMERO-plugins-push#1169. See the console output for more details. Possible conflicts:
--conflicts
Conflicting PR. Removed from build OMERO-plugins-push#1176. See the console output for more details. Possible conflicts:
--conflicts Conflict resolved in build OMERO-plugins-push#1179. See the console output for more details.
Thanks @muhanadz for the review. 27050f2 should amend the README where most of the information about the library usage is captured at the moment.
Thanks @muhanadz. The new column detection behavior is now released as omero-metadata 0.11.0
🎉
Fixes #76
Reproducible scenario
First create a minimal dataset/image hierarchy e.g. as follows:
CSV files with sparse string data such as the one below are correctly handled by the current HEAD of
omero-metadata
.The columns with missing values are mapped as
s/StringCOlumn
and the missing value are turned into empty strings where runningomero metadata populate --file sparse_string_column.csv $dataset
e.g.CSV files with sparse numerical columns such as the one below currently fail during the population command:
Here, the
meas1
column is currently mapped into ad
header type/DoubleColumn
by thepandas
detection logic, With the defaultomero metadata populate --file command, the table population fails with
ValueError: Empty Double or Long value. Use --allow_nan to convert to NaN`.Proposed changes
Since the library already includes some logic allowing the user to control whether NaN values are allowed in the OMERO.table (introduced in #60), this PR proposes the following changes
MetadataControl.detect_headers
API supports an extrakeep_default_na
argument (True
by default). Its value is forwarded to
pandas.read_csvand determines controls how pandas should handle NA (missing) values and whether the column is detected as
dvs
s` (see https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)MetadataControl.detect_headers
API with different combination of tables/argumentsargs.allow_nan
(False
by default) as pass it askeep_default_na
toMetadataControl.detect_headers
fe73a17d2a71fd7d220c48891a2364110b59f4f1 adds a cosmetic change defining GNU-style aliases of the command-line arguments (
--manual-header
,--allow-nan
) using hyphen as separator. The existing underscore separated flags are preserved.Testing
With these changes, annotating of sparse CSV tables using the default header detection should be functional in all cases.
if the tabular data is dense or containing sparse numerical columns,, the behavior of the command will depend on the
--allow-nan
flagwill detect the sparse numeric column as a
StringColumn
and store the missing values as empty stringswill detect the sparse numeric column as a
DoubleColumn
and store missing values asnan