Open cmdoret opened 11 months ago
The dataset is "only" 83MB, but I cannot seem to download it, even when setting it to "large". It is still waiting after 15min.
@cmdoret For that I think we would need another option xlarge
for ds$size
. pxRRead
in that case works better on downloaded files: so the file needs to be downloaded first that parsed and then the doqnload should be removed again.
I tried implementing an xlarge
option where the file is first downloaded and then parsed:
tmp <- paste0(tempfile(), ".px")
download.file(ds$read_path, tmp)
df <- pxRRead::scan_px_file(tmp,
locale = ds$lang,
encoding = ds$encoding
)
ds$data <- df$dataframe
But the pxRRead::scan_px_file hangs for 10+ minutes at:
INFO [2023-08-04 11:15:54] unsupported keyword detected DATASYMBOL5[en]
INFO [2023-08-04 11:15:54] unsupported keyword detected DATASYMBOL6
INFO [2023-08-04 11:15:54] unsupported keyword detected DATASYMBOL6[fr]
INFO [2023-08-04 11:15:54] unsupported keyword detected DATASYMBOL6[en]
After investigation, it appears the culprit is pxRRead::parse_px_lines
. The file contains ~1M lines and parsing them is pretty slow. I have open an issue on the pxRRead
on this topic https://github.com/SDSC-ORD/pxRRead/issues/20
Issue addressed upstream in https://github.com/SDSC-ORD/pxRRead/pull/21. The parser can now accommodate huge file. This dataset is parsed in under 1 minute instead of 3h.
@nooralahzadeh is the structure OK for you? metadata and queries are at https://github.com/statistikZH/statbotData/tree/main/pipelines/A15
year | farmholding_system | farmholdings | employees_total | full_time_employees_75_percent_or_more | part_time_employees_50_75_percent | part_time_employees_2_less_than_50_percent | employees_men | employees_women | employees_women_manager_label | employees_swiss | employees_foreign_nationals | family_employees | beef_cattle_and_cows_farm | horse_and_other_equine_farm | sheep_farm | goat_farm | pig_farms | poultry_farm | farms_with_other_animals | utilised_agricultural_area_in_hectares | arable_land_in_hectares | grassland_in_hectares | permanent_crops_in_hectares | other_utilised_agricultural_area_in_hectares | livestock_beef_cattle_and_cows | livestock_horses_and_other_equines | livestock_sheep | livestock_goats | livestock_pigs | livestock_poultry | livestock_other_animals | spatialunit_uid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 | Organic farming | 346 | 947 | 331 | 195 | 421 | 614 | 333 | 40 | 809 | 138 | 724 | 164 | 106 | 145 | 56 | 15 | 36 | 31 | 7252.4681 | 308.6694 | 6444.2811 | 486.4801 | 13.0375 | 5161 | 680 | 15332 | 1793 | 124 | 1973 | 314 | 23_A.ADM1 |
2011 | Farmholding system - total | 926 | 2369 | 1411 | 377 | 581 | 1626 | 743 | 43 | 2186 | 183 | 1900 | 712 | 255 | 86 | 73 | 49 | 81 | 49 | 32012.902 | 4207.36 | 27128.9113 | 597.5507 | 79.08 | 42151 | 1813 | 2818 | 580 | 7232 | 78785 | 963 | 24_A.ADM1 |
2021 | Not defined | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5_A.ADM1 |
2012 | Not defined | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23_A.ADM1 |
2022 | Organic farming | 193 | 530 | 190 | 145 | 195 | 328 | 202 | 15 | 530 | 0 | 490 | 175 | 21 | 21 | 32 | 4 | 47 | 10 | 2796.25 | 6.92 | 2745.36 | 1.62 | 42.35 | 6151 | 104 | 471 | 331 | 142 | 20626 | 101 | 6_A.ADM1 |
2003 | Organic farming | 360 | 1336 | 703 | 226 | 407 | 844 | 492 | 0 | 1129 | 207 | 859 | 278 | 115 | 80 | 66 | 48 | 201 | 63 | 7019.6 | 1327.49 | 5382.56 | 55.93 | 253.62 | 9625 | 627 | 3370 | 455 | 852 | 34682 | 734 | 1_A.ADM1 |
2020 | Farmholding system - total | 1324 | 3874 | 1608 | 786 | 1480 | 2438 | 1436 | 114 | 3581 | 293 | 3088 | 873 | 414 | 200 | 123 | 134 | 369 | 155 | 31463.4045 | 10314.7904 | 20784.8522 | 167.3145 | 196.4474 | 40936 | 3351 | 7081 | 1311 | 25912 | 190494 | 1811 | 11_A.ADM1 |
2006 | Organic farming | 20 | 82 | 42 | 12 | 28 | 46 | 36 | 0 | 75 | 7 | 56 | 16 | 3 | 5 | 2 | 4 | 10 | 4 | 528.32 | 193.92 | 322.72 | 8.88 | 2.8 | 685 | 5 | 71 | 6 | 201 | 5417 | 8 | 14_A.ADM1 |
2011 | Farmholding system - total | 2866 | 8936 | 4463 | 1619 | 2854 | 5591 | 3345 | 101 | 7827 | 1109 | 6544 | 1757 | 513 | 341 | 217 | 432 | 738 | 396 | 50033.6 | 16951.245 | 30388.448 | 2376.727 | 317.18 | 75127 | 3647 | 19321 | 1454 | 198572 | 1070618 | 5381 | 20_A.ADM1 |
2017 | Conventional farming | 3044 | 9130 | 4084 | 1747 | 3299 | 5803 | 3327 | 181 | 8082 | 1048 | 6818 | 1674 | 640 | 303 | 217 | 144 | 723 | 398 | 64363.65 | 26172.95 | 35064.06 | 1424.68 | 1701.96 | 85384 | 5812 | 10690 | 1638 | 34204 | 453421 | 8672 | 1_A.ADM1 |
Proposal to include dataset: Employees, farmholdings, utilized agricultural area and livestock on level 1 of classification by canton
Dataset properties
Additional notes
Questions