Closed mvarewyck closed 1 year ago
I have checked files on S3 bucket
inbo-wbe-uat-data
on February 20th, 2023.(1) Some files were copied from the inst/extdata folder, but are currently not being used or became redundant for the app. To keep the buckets clean, I suggest we remove them (temporarily).
- "FaunabeheerDeelzones_0000_2018_habitats.csv" -> currently not used
- "FaunabeheerDeelzones_2019_9999_habitats.csv" -> currently not used
- "Toekenningen_ree.csv" -> not needed(?) we use 'Verwezenlijkt_categorie_per_afschotplan.csv'
- "fbz_gemeentes_habitats.csv" -> currently not used
- "waarnemingen_2022.csv" -> no longer needed. we use 'waarnemingen_wild_zwijn_processed.csv'
With exception of "fbz_gemeentes_habitats.csv" I can't imagine needing these files in the future.
(2) For the file
rshiny_reporting_data_ecology.csv
it takes locally 15 seconds to load. This is due to the size, but also because we do some data processing inloadRawData()
. Ideally, we create a processed file (like we do for waarnemingen_2022.csv) to speed this up. @SanderDevisscher Should we implement a function at our side for cleaning and can you incorporate it in the script at your side?
I think it is a good idea to move most, if not all, preprocessing of data to the backoffice. So following this logic I would say it would be nice to have a function that processes ecology (and geography ?) before putting it on the bucket. If its provided I'll incorperate it into our upload data script.
I would prefer the following approach for migrating from the unprocessed to processed files on S3.
inbo-wbe-uat-data
bucket, using a new function createRawData()
in #346. inbo-wbe-uat-data
bucket. createRawData()
I'm working on. I'll send more instructions later.@mvarewyck I accidently uploaded "FaunabeheerDeelzones_0000_2018_habitats.csv", "FaunabeheerDeelzones_2019_9999_habitats.csv" & "Toekenningen_ree.csv" tot the UAT - bucket. Can you remove them ?
@mvarewyck I accidently uploaded "FaunabeheerDeelzones_0000_2018_habitats.csv", "FaunabeheerDeelzones_2019_9999_habitats.csv" & "Toekenningen_ree.csv" tot the UAT - bucket. Can you remove them ?
All done. E.g. aws.s3::delete_object(object = "Toekenningen_ree.csv", bucket = "inbo-wbe-uat-data")
Creating preprocessed data can be done using this function iteratively.
@SanderDevisscher To be included in the INBO script:
for (iType in c("eco", "geo", "wildschade", "kbo_wbe", "waarnemingen"))
createRawData(dataDir = "~/git/reporting-rshiny-grofwildjacht/dataS3", bucket = "inbo-wbe-uat-data", type = iType)
Notes:
dataDir
points to the folder with the files that you would currently upload for the ecology, geography etc data. Expected filenames are still the same. createWaarnemingenData()
is deprecated as we can use the function above with type 'waarnemingen'.eco = "rshiny_reporting_data_ecology.csv",
geo = "rshiny_reporting_data_geography.csv",
wildschade = "WildSchade_georef.csv",
kbo_wbe = "Data_Partij_Cleaned.csv",
waarnemingen = "waarnemingen_wild_zwijn_processed.csv"
When processing the files I normally append _processed
to it. However, for waarnemingen it is a bit confusing as the input file is already waarnemingen_wild_zwijn_processed.csv
, so the output file has the same name.
@SanderDevisscher
(1) Can the input file for waarnemingen be without _processed
suffix? or
(2) Should I change the suffix for all cleaned files to sth like _clean
?
For the sake of consistency I would say we go with option 1 and drop the _processed
suffix.
Ik heb per ongeluk een foutieve file aangemaakt op S3, maar kan deze niet verwijderen. Vroeger lukte dit wel.
@SanderDevisscher
Kan jij deze file verwijderen? Als het bij jou lukt, zal ik aan Bert vragen om mijn rechten aan te passen.
aws.s3::delete_object(object = "waarnemingen_wild_zwijn_processedcsv", bucket = "inbo-wbe-uat-data")
done
- expected filenames (as input) per type
eco = "rshiny_reporting_data_ecology.csv", geo = "rshiny_reporting_data_geography.csv", wildschade = "WildSchade_georef.csv", kbo_wbe = "Data_Partij_Cleaned.csv", waarnemingen = "waarnemingen_wild_zwijn.csv"
@mvarewyck I could not find function "readShapeData" after using
devtools::install_github("inbo/reporting-rshiny-grofwildjacht@318-dashboard-figuren-code",
subdir = "reporting-grofwild", force = TRUE)
rebase ?
@mvarewyck I could not find function "readShapeData" after using
@SanderDevisscher
readShapeData()
is now called createShapeData()
- in accordance with the other create* functions
I've implemented the needed logic to preprocess the eco, geo, wildschade, kbo_wbe and waarnemingen files.
I'm waiting to test the changes in the UAT environment but it is currently down (504 gateway timeout).
In docker however the datachecks pass without a flaw.
I have checked files on S3 bucket
inbo-wbe-uat-data
on February 20th, 2023.(1) Some files were copied from the inst/extdata folder, but are currently not being used or became redundant for the app. To keep the buckets clean, I suggest we remove them (temporarily).
(2) For the file
rshiny_reporting_data_ecology.csv
it takes locally 15 seconds to load. This is due to the size, but also because we do some data processing inloadRawData()
. Ideally, we create a processed file (like we do for waarnemingen_2022.csv) to speed this up. @SanderDevisscher Should we implement a function at our side for cleaning and can you incorporate it in the script at your side?