kartoza / django-bims

https://testing.healthyrivers.kartoza.com
GNU Affero General Public License v3.0
12 stars 19 forks source link

Issues with Checklist #4198

Closed helendallas closed 2 months ago

helendallas commented 2 months ago

There are still issues with the checklist.

Please see docs from Nazley here

BIMS_Issues_Naz_August 2024.docx

and assoociated files

Camdeboo Mammals Checklist csv_28Aug.csv Camdeboo Mammals Checklist PDF_28Aug.pdf Camdeboo Plant Checklist.csv Camdeboo Plant Checklist.pdf

dimasciput commented 2 months ago

@helendallas some of my comments

image

The field is different; the filter belongs to the location context, and the park name is attached to the location site. We could enforce checking using the filter, but there are many implications we would need to address in the future. For instance, what if the layer name used to get the park name changes, or what if the park name attached to the location site and the filter are slightly different? I suggest using only the park name from the location site, as it is easier to maintain for now and avoids many potential headaches in the future.

image

Yes, it's because the new sites are still processing to get the spatial filter, and there are a lot of sites on sanparks, so it might take a while to update.

image

It would be great if you could provide an example of the source data, so I know what data I should display in this column

helendallas commented 2 months ago

@dimasciput This was sent by Dian. I haven't had a chance to read it yet as I have another deadline today.

Some of the issues and responses below make me think we may need some more fields added in places.

Park/MPA name and Excel checklist download For the Excel checklist download, it would be useful to have a field included with the park/MPA name. i.e. whether it is based on GBIF data (points within a park/mpa boundary polygon) or whether it is uploaded occurrence data (points within park polygon or park name now allocated to centroid or uploaded as centroid) we need a field that has the park/mpa name. the field should indicate the name of the park where the data is relevant to (polygon the coordinates fall within). I am not sure whether this makes sense.

Checklist sources Then for the creation of the pdf checklist. The source field in the checklist is the list of references/ record of all the data sources that have reported that species. For the GBIF data, the dataset name is used. For the uploaded occurrence data a citation needs to be obtained from the occurrence data upload file.

I think two things are needed here. We need a table that indicates the abbreviated versions of dataset names to be used e.g. instead of the system displaying “iNaturalist research-grade observations” it should display “iNat”. Additionally, there are multiple GBIF datasets. It isn’t just iNaturalist. There should already be more dataset names showing than just iNaturalist. The table could include the dataset key, dataset name and abbreviated name. Something like the GBIF dataset spreadsheet but with a field for abbreviated names included. We need a field in the Occurrence upload template that specifies the citation to be used for the datasource/reference. E.g. if I was making a checklist and one of the datasources was a publication, then I would put the citation for the publication as the source of the name e.g. (Name Year) However again, this might be a bit long and we might want to use an abbreviated version e.g. instead of “Adamson & Salter 1950”, we might want to use “A&S 1950”. In this case, the citation is author and year. In other cases, the citation might be the title of the database or unpublished data. Again, the title might be too long, and we might want to use an abbreviated version e.g. “Census” instead of the long version of the title or “SS” instead of Snapshot Safari. If you look in the uploaded occurrence data, there isn’t one field that currently gives you the contents of the source field of the checklist. Attached is an example for the Camdeboo mammal data. If you compare the source field in the attached pdf with the info in the Occurrence data upload there is an issue that there isn’t one field to pull the information from. One problem is that we don’t yet know all the dataset names that need to be included.

dimasciput commented 2 months ago

Hi @helendallas , do we really need abbreviations for the sources? This was not in the original requirement. We can add them, but it would lead to scope creep. I don't think there was an issue with showing the full names for now, right?

dimasciput commented 2 months ago

Then for the creation of the pdf checklist. The source field in the checklist is the list of references/ record of all the data sources that have reported that species. For the GBIF data, the dataset name is used. For the uploaded occurrence data a citation needs to be obtained from the occurrence data upload file.

This is fixed on staging, it will show you the the dataset name from gbif and also source reference name.

helendallas commented 2 months ago

Hi @helendallas , do we really need abbreviations for the sources? This was not in the original requirement. We can add them, but it would lead to scope creep. I don't think there was an issue with showing the full names for now, right?

I agree @dimasciput Let me email them. I also want to check the hours left. Can you estimate how long it would it take to include abbreviations?

helendallas commented 2 months ago

@dimasciput This was sent by Dian. I haven't had a chance to read it yet as I have another deadline today.

Some of the issues and responses below make me think we may need some more fields added in places.

Park/MPA name and Excel checklist download For the Excel checklist download, it would be useful to have a field included with the park/MPA name. i.e. whether it is based on GBIF data (points within a park/mpa boundary polygon) or whether it is uploaded occurrence data (points within park polygon or park name now allocated to centroid or uploaded as centroid) we need a field that has the park/mpa name. the field should indicate the name of the park where the data is relevant to (polygon the coordinates fall within). I am not sure whether this makes sense.

So @dimasciput there are GBIF sites that have no Park Name. Is it because geocontext data are still being pulled through or because the spatial layers are problematic?

Screenshot 2024-09-03 at 11 55 45

dimasciput commented 2 months ago

@helendallas

So @dimasciput there are GBIF sites that have no Park Name. Is it because geocontext data are still being pulled through or because the spatial layers are problematic?

The park names you see come from the location site data users upload, not from geocontext, which is why some park names are missing. They come from different sources. I could add a feature to display park names in the csv from geocontext, but that would be a new request. We also have issues with sites not getting park names from geocontext, especially since there are 100k sites in sanparks alone, and getting all the names would take days.

We planned to fix this in the Kafue project to speed up data processing. Should we wait until Kafue is done before showing park names from geocontext? Otherwise, many will still be missing.

helendallas commented 2 months ago

Ok thanks @dimasciput Yes lets wait and try speed up with Kafue funds. thanks

helendallas commented 2 months ago

As per my comments by email. attached here as well.

Dimas requested an example to show the issues with the checklist - I decided to extract fish data for the Cape Fold Ecoregion in FBIS. Please see attached xls and pdf files. See comments below on xls for now. PDF to follow.

Xls. - The source column is very long, but I actually enjoy seeing the full citation. If you start abbreviating it will become impossible to trace back to the original source reference. I am not sure if there is a clearer way to concatenate so many citations Dimas? Alternatively only include Author(s) and Date for peer-reviewed scientific articles and books, reports and theses.

For Global Biodiversity Information Facility (GBIF) - ideally the dataset (as per column T needs to be listed here, and where this is blank, it can be Global Biodiversity Information Facility (GBIF). See examples of Pseudobarbus asper (Boulenger, 1911) where dataset includes, Ichthyology,iNaturalist research-grade observations,NMNH Extant Biology

Also From Dimas: Sometimes gbif occurrence doesn't return a dataset name.

For example this data https://sanparks.do.kartoza.com/admin/bims/biologicalcollectionrecord/1448414/change/

It doesn't have a dataset name, it only has a dataset key. But when I check the key from GBIF it returns nothing. Also it will take a lot of time to check every occurrence data if they don't have the dataset name from gbif. Because sanparks already 900k+ data.

FBIS Fish Checklist CFE.xlsx

I am still waiting for the pdf to be created for the same dataset. I will comment and share when it arrives. @dimasciput Not sure it is being processed?

helendallas commented 2 months ago

Let's hope that #4229 will resolve this issue. I have asked Di to check

helendallas commented 2 months ago

Waiting for feedback form SANParks. Closing ticket. If issues arise I will make a new ticket.