movestore / rds2csv

transforms rds-output (from any APP) into a csv file (with only non-NA columns) that is downloadable as artefact. The App returns the input data rds.
0 stars 3 forks source link

improve/explain column order #12

Open sarahcd opened 5 months ago

sarahcd commented 5 months ago

There are a ton of columns in the data.csv and the column order appears to be mostly random. It would be helpful to organize this somewhat by reordering and/or explaining the order. They seem to be grouped by attributes from the

Here are some suggestions:

1) Move the most key attributes together on the left side. To me this is most important to reduce chance that key information is unnoticed or lost.

2) Several columns are internal Movebank database IDs that have a human-readable version. Consider leaving out (by default?) or moving to the right side of the table.

3) Except for the specifics in 1 and 2, add in order: attributes derived from previous moveApps, then alphabetically the animal, deployment, tag, study metadata.

If any of this should be sent as requests for Movebank Location or the Movebank REST API let me know.

sarahcd commented 5 months ago

Likewise for trackInfo.csv. We should move the local identifiers to the left and remove the internal IDs or put them on the right.

annescharf commented 5 months ago

given that we cannot assume that all data comes from movebank, and even if, each study has different attributes, ordering these will always be a bit of a headache. Currently the ordere is : certain columns come first (see below extracted from the documentation), than the columns associated to the event as they come from movebank, than those associate to track info, in both cases, first those that come from movebank, than those added in moveapps.

data.csv: the complete data set as a csv table (excluding columns that only contain NAs). The first 4 columns contain the information of the track IDs, timestamps, and coodinates X and Y used for any previous analysis.

trackInfo.csv: the information associated to the tracks. This table contains one entry per track. The first column contains the information of the track ID used for any previous analysis.

sarahcd commented 5 months ago

Could 1 and 2 be implemented for those data that are from Movebank? For any other data I assume the attributes would not be present or be all NA so it wouldn't be an issue.