Test to write: confirm observation_id's refer to unique observations

ehwenk commented 1 year ago

In austraits$traits there should be a single row of data for each unique combination of:

dataset_id, trait_name, observation_id, source_id, taxon_name, population_id, individual_id, temporal_id, method_id, entity_context_id, value_type, original_name

If this is not true, one can't pivot austraits$traits wider.

The following code can be modified to run as a test for each dataset.

austraits$traits %>% 
  filter(dataset_id == current_study) %>%
  distinct(trait_name) -> traits

austraits$traits %>% 
  filter(dataset_id == current_study) %>% 
  select(dataset_id, trait_name, value, observation_id, source_id, taxon_name, population_id, individual_id, temporal_id, method_id, entity_context_id, value_type, original_name) %>%
  pivot_wider(names_from = trait_name, values_from = value, values_fn = length) %>% 
  pivot_longer(cols = traits$trait_name) %>%
  filter(value > 1)

The result should be that the final line of code yields 0 rows. But it would be best if the test fails, if it outputs the non-unique rows to most easily work out what the problem is.

ehwenk commented 1 year ago

Also need sample_age_class in list of variables to pivot_wider

yangsophieee commented 1 year ago

Moved this issue to traits.build.

traitecoevo / austraits.build

Test to write: confirm observation_id's refer to unique observations #746