improve/explain column order

sarahcd commented 5 months ago

There are a ton of columns in the data.csv and the column order appears to be mostly random. It would be helpful to organize this somewhat by reordering and/or explaining the order. They seem to be grouped by attributes from the

event data
attributes derived from previous moveApps
deployment reference data
animal reference data
tag reference data
study metadata

Here are some suggestions:

1) Move the most key attributes together on the left side. To me this is most important to reduce chance that key information is unnoticed or lost.

individual_name_deployment_id (or trackID?)
timestamp
coords_x
coords_y
individual_local_identifier
deployment_local_identifier
tag_local_identifier
event_id
sensor_type (better to include the human-readable one, I think still to be added in Movebank Location)
taxon_canonical_name

2) Several columns are internal Movebank database IDs that have a human-readable version. Consider leaving out (by default?) or moving to the right side of the table.

sensor_type_id
deployment_id
tag_id
individual_id

3) Except for the specifics in 1 and 2, add in order: attributes derived from previous moveApps, then alphabetically the animal, deployment, tag, study metadata.

If any of this should be sent as requests for Movebank Location or the Movebank REST API let me know.

sarahcd commented 5 months ago

Likewise for trackInfo.csv. We should move the local identifiers to the left and remove the internal IDs or put them on the right.

annescharf commented 5 months ago

given that we cannot assume that all data comes from movebank, and even if, each study has different attributes, ordering these will always be a bit of a headache. Currently the ordere is : certain columns come first (see below extracted from the documentation), than the columns associated to the event as they come from movebank, than those associate to track info, in both cases, first those that come from movebank, than those added in moveapps.

data.csv: the complete data set as a csv table (excluding columns that only contain NAs). The first 4 columns contain the information of the track IDs, timestamps, and coodinates X and Y used for any previous analysis.

trackInfo.csv: the information associated to the tracks. This table contains one entry per track. The first column contains the information of the track ID used for any previous analysis.

sarahcd commented 5 months ago

Could 1 and 2 be implemented for those data that are from Movebank? For any other data I assume the attributes would not be present or be all NA so it wouldn't be an issue.

movestore / rds2csv

improve/explain column order #12