Questions about contribute to ilamb-data collection

Great to hear! See this tutorial for some guidance: format data tutorial. Please also reference recent scripts to see how we pre-process, such as these gridded datasets: convert GFW, convert FLUXCOM or this point dataset: convert Ameriflux

We spatially resample gridded data to 0.5 degree resolution (EPSG:4326), which with spatially global bounds is a netcdf with 720 x 360 cells. There are not time constraints, but applicable time bounds (e.g., 1950 to 2020) must be specified in the netcdf you create.
ILAMB can handle point and gridded datasets. ILAMB will sample the model grid at each point for comparison. See the Ameriflux example I linked above.
There is no spatial extent requirement for the benchmark dataset. For example, here, you can see that the bias and bias scores are spatially constrained to the extent of the benchmark data: https://www.ilamb.org/CMIP5v6/historical/EcosystemandCarbonCycle/Biomass/Tropical/Tropical.html
Yes, if there is an urban-specific variable from the benchmark dataset that corresponds to an urban-specific variable in the land model you're comparing to, then you can do so.

In general, be sure to format your netcdf using CF Conventions and, if possible, format variable names according to CMIP6 variables grouped by MIP table or all accepted CMIP6 variables.

To contribute to this repo:

Create a new issue with a description of the benchmark dataset you'd be adding, plus any links or useful information in the body and some explanation for why it's a useful bench-marking dataset
Fork from ILAMB-Data
Create a new folder to work in (we generally name it after the folks/project who made the dataset; name it whatever you like)
Write your convert file inside the folder you created
When finished, submit a pull request for us to review

rubisco-sfa / ILAMB-Data