inbo / reporting-rshiny-grofwildjacht

Rshiny app for grofwildjacht
https://grofwildjacht.inbo.be/
MIT License
1 stars 1 forks source link

Clean files on S3 buckets #388

Closed mvarewyck closed 1 year ago

mvarewyck commented 1 year ago

I have checked files on S3 bucket inbo-wbe-uat-data on February 20th, 2023.

(1) Some files were copied from the inst/extdata folder, but are currently not being used or became redundant for the app. To keep the buckets clean, I suggest we remove them (temporarily).

(2) For the file rshiny_reporting_data_ecology.csv it takes locally 15 seconds to load. This is due to the size, but also because we do some data processing in loadRawData(). Ideally, we create a processed file (like we do for waarnemingen_2022.csv) to speed this up. @SanderDevisscher Should we implement a function at our side for cleaning and can you incorporate it in the script at your side?

SanderDevisscher commented 1 year ago

I have checked files on S3 bucket inbo-wbe-uat-data on February 20th, 2023.

(1) Some files were copied from the inst/extdata folder, but are currently not being used or became redundant for the app. To keep the buckets clean, I suggest we remove them (temporarily).

  • "FaunabeheerDeelzones_0000_2018_habitats.csv" -> currently not used
  • "FaunabeheerDeelzones_2019_9999_habitats.csv" -> currently not used
  • "Toekenningen_ree.csv" -> not needed(?) we use 'Verwezenlijkt_categorie_per_afschotplan.csv'
  • "fbz_gemeentes_habitats.csv" -> currently not used
  • "waarnemingen_2022.csv" -> no longer needed. we use 'waarnemingen_wild_zwijn_processed.csv'

With exception of "fbz_gemeentes_habitats.csv" I can't imagine needing these files in the future.

(2) For the file rshiny_reporting_data_ecology.csv it takes locally 15 seconds to load. This is due to the size, but also because we do some data processing in loadRawData(). Ideally, we create a processed file (like we do for waarnemingen_2022.csv) to speed this up. @SanderDevisscher Should we implement a function at our side for cleaning and can you incorporate it in the script at your side?

I think it is a good idea to move most, if not all, preprocessing of data to the backoffice. So following this logic I would say it would be nice to have a function that processes ecology (and geography ?) before putting it on the bucket. If its provided I'll incorperate it into our upload data script.

mvarewyck commented 1 year ago

I would prefer the following approach for migrating from the unprocessed to processed files on S3.

SanderDevisscher commented 1 year ago

@mvarewyck I accidently uploaded "FaunabeheerDeelzones_0000_2018_habitats.csv", "FaunabeheerDeelzones_2019_9999_habitats.csv" & "Toekenningen_ree.csv" tot the UAT - bucket. Can you remove them ?

mvarewyck commented 1 year ago

@mvarewyck I accidently uploaded "FaunabeheerDeelzones_0000_2018_habitats.csv", "FaunabeheerDeelzones_2019_9999_habitats.csv" & "Toekenningen_ree.csv" tot the UAT - bucket. Can you remove them ?

All done. E.g. aws.s3::delete_object(object = "Toekenningen_ree.csv", bucket = "inbo-wbe-uat-data")

mvarewyck commented 1 year ago

Creating preprocessed data can be done using this function iteratively.

@SanderDevisscher To be included in the INBO script:

for (iType in c("eco", "geo", "wildschade", "kbo_wbe", "waarnemingen"))
   createRawData(dataDir = "~/git/reporting-rshiny-grofwildjacht/dataS3", bucket = "inbo-wbe-uat-data", type = iType)

Notes:

mvarewyck commented 1 year ago

When processing the files I normally append _processed to it. However, for waarnemingen it is a bit confusing as the input file is already waarnemingen_wild_zwijn_processed.csv, so the output file has the same name.

@SanderDevisscher (1) Can the input file for waarnemingen be without _processed suffix? or (2) Should I change the suffix for all cleaned files to sth like _clean?

SanderDevisscher commented 1 year ago

For the sake of consistency I would say we go with option 1 and drop the _processed suffix.

mvarewyck commented 1 year ago

Ik heb per ongeluk een foutieve file aangemaakt op S3, maar kan deze niet verwijderen. Vroeger lukte dit wel.

@SanderDevisscher Kan jij deze file verwijderen? Als het bij jou lukt, zal ik aan Bert vragen om mijn rechten aan te passen. aws.s3::delete_object(object = "waarnemingen_wild_zwijn_processedcsv", bucket = "inbo-wbe-uat-data")

SanderDevisscher commented 1 year ago

done

mvarewyck commented 1 year ago
  • expected filenames (as input) per type
    eco = "rshiny_reporting_data_ecology.csv",
    geo = "rshiny_reporting_data_geography.csv",
    wildschade = "WildSchade_georef.csv",
    kbo_wbe = "Data_Partij_Cleaned.csv",
    waarnemingen = "waarnemingen_wild_zwijn.csv"
SanderDevisscher commented 1 year ago

@mvarewyck I could not find function "readShapeData" after using

devtools::install_github("inbo/reporting-rshiny-grofwildjacht@318-dashboard-figuren-code", 
                           subdir = "reporting-grofwild", force = TRUE)

rebase ?

mvarewyck commented 1 year ago

@mvarewyck I could not find function "readShapeData" after using

@SanderDevisscher readShapeData() is now called createShapeData() - in accordance with the other create* functions

SanderDevisscher commented 1 year ago

I've implemented the needed logic to preprocess the eco, geo, wildschade, kbo_wbe and waarnemingen files. I'm waiting to test the changes in the UAT environment but it is currently down (504 gateway timeout).
In docker however the datachecks pass without a flaw.