iobis / pyobis

OBIS Python client
https://iobis.github.io/pyobis
MIT License
15 stars 10 forks source link

curated column subsets? #52

Open 7yl4r opened 2 years ago

7yl4r commented 2 years ago

Working with occurrence data I feel overwhelmed by the number of columns.

Would it be a good idea to allow for easy subsetting of columns?

Here is some code in which I have done that:

# === select a subset of columns for improved table legibility :
shortened_df = df[[
    # time-related columns:
    "date_year", "endDayOfYear", "verbatimEventDate", "startDayOfYear", "dateIdentified", "eventTime", "date_mid",
    "eventDate", "month", "date_start", "date_end", "day", "year",
    # row identifier columns:
    "recordNumber", "ownerInstitutionCode", "parentEventID", "identifiedBy", "eventID", "collectionID", "organismID", "recordedBy", "datasetID", "category", "datasetName",
    "institutionCode", "occurrenceID", "collectionCode", "dataset_id", "id", "modified", "catalogNumber", "fieldNumber",
    "institutionID",
    # additional remarks:
    "occurrenceRemarks", "taxonRemarks", "eventRemarks", "samplingProtocol",
    "typeStatus", "preparations", "establishmentMeans", "dynamicProperties", "type",
    # occurrence specifics:
    "individualCount", "occurrenceStatus", "originalScientificName", "absence",
    "terrestrial", "basisOfRecord", "dropped",
]]

We could include arrays of curated column lists so that a user could do something like:

df = df[pyobis.column_subset.taxonomic + pyobis.column_subset.temporal]

To drop everything but the curated list of "taxonomic" and "temporal" columns. Thoughts?

ayushanand18 commented 2 years ago

I have got some doubts on this: