nc-minibbs / mbbs

A repository for the Mini-Bird Breeding Survey data
https://minibbs.us
Other
2 stars 0 forks source link

Clarify data pipeline #60

Open bsaul opened 3 months ago

bsaul commented 3 months ago
          > @IJBG - a thought I had while tinkering ... we're calling a lot of datasets `mbbs` and say things like `Any mbbs dataset, either the whole survey area or one county`. But what do we mean by `mbbs dataset`? What is it's shape (e.g. required columns etc)? It might be worth defining the various stages of processing and the shape of the data going into and out of each stage.

I think defining that would be good. Right now, 'mbbs dataset' refers to a post-processing dataset that's gone through inst/import_data.R. The key columns are mbbs_county, route_num, route_ID, common_name, and count. But there are plenty of functions that require other columns as well (eg. process_species_comments needs the species_comments column).

With the goal of having two clear end-user datasets at the route and stop level, there's now also the 2nd level of processing happening after inst/import_data.

So we've got:

Originally posted by @IJBG in https://github.com/nc-minibbs/mbbs/issues/54#issuecomment-2120823709