kamu-data / kamu-cli

Next-generation decentralized data lakehouse and a multi-party stream processing network
https://kamu.dev
Other
293 stars 12 forks source link

Spark crashes when inferring CSV schema with a lot of columns #146

Closed sergiimk closed 5 months ago

sergiimk commented 1 year ago

Observed on a dataset with 900+ columns:

When ingesting this dataset:

Root dataset manifest without schema ```yaml kind: DatasetSnapshot version: 1 content: name: data kind: root metadata: - kind: setPollingSource fetch: kind: url url: https://www.ahrq.gov/sites/default/files/wysiwyg/sdoh/SDOH_2019_COUNTY_1_0.xlsx prepare: - kind: pipe command: - sh - -c - | tempfile=`mktemp` cp /dev/stdin $tempfile xlsx2csv -n Data $tempfile rm $tempfile read: kind: csv header: true preprocess: kind: sql engine: spark query: | select to_date(YEAR, "yyyy") as event_time, * from input merge: kind: ledger primaryKey: - YEAR - COUNTYFIPS ```

Spark crashes with:

Exception in thread "main" java.lang.StackOverflowError
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
    at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
    at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)

The same dataset but with schema specified explicitly succeeds:

Root dataset manifest with schema ```yaml kind: DatasetSnapshot version: 1 content: name: data-with-schema kind: root metadata: - kind: setPollingSource fetch: kind: url url: https://www.ahrq.gov/sites/default/files/wysiwyg/sdoh/SDOH_2019_COUNTY_1_0.xlsx prepare: - kind: pipe command: - sh - -c - | tempfile=`mktemp` cp /dev/stdin $tempfile xlsx2csv -n Data $tempfile rm $tempfile read: kind: csv header: true schema: - YEAR string - COUNTYFIPS string - STATEFIPS string - STATE string - COUNTY string - REGION string - TERRITORY string - ACS_TOT_POP_WT float - ACS_TOT_POP_US_ABOVE1 float - ACS_TOT_POP_ABOVE5 float - ACS_TOT_POP_ABOVE15 float - ACS_TOT_POP_ABOVE16 float - ACS_TOT_POP_16_19 float - ACS_TOT_POP_ABOVE25 float - ACS_TOT_CIVIL_POP_ABOVE18 float - ACS_TOT_CIVIL_VET_POP_ABOVE25 float - ACS_TOT_OWN_CHILD_BELOW17 float - ACS_TOT_WORKER_NWFH float - ACS_TOT_WORKER_HH float - ACS_TOT_CIVILIAN_LABOR float - ACS_TOT_CIVIL_EMPLOY_POP float - ACS_TOT_POP_POV float - ACS_TOT_CIVIL_NONINST_POP_POV float - ACS_TOT_CIVIL_POP_POV float - ACS_TOT_GRANDCHILDREN_GP float - ACS_TOT_HU float - ACS_TOT_HH float - ACS_AVG_HH_SIZE float - ACS_TOT_CIVIL_NONINST_POP float - ACS_TOT_CIVIL_VET_POP float - ACS_PCT_CHILD_DISAB float - ACS_PCT_DISABLE float - ACS_PCT_NONVET_DISABLE_18_64 float - ACS_PCT_VET_DISABLE_18_64 float - ACS_PCT_MALE float - ACS_PCT_FEMALE float - ACS_PCT_CTZ_US_BORN float - ACS_PCT_CTZ_NONUS_BORN float - ACS_PCT_FOREIGN_BORN float - ACS_PCT_NON_CITIZEN float - ACS_PCT_CTZ_NATURALIZED float - ACS_PCT_CTZ_ABOVE18 float - ACS_PCT_NONCTN_1990 float - ACS_PCT_NONCTN_1999 float - ACS_PCT_NONCTN_2000 float - ACS_PCT_NONCTN_2010 float - ACS_PCT_API_LANG float - ACS_PCT_ENGL_NOT_ALL float - ACS_PCT_ENGL_NOT_WELL float - ACS_PCT_ENGL_VERY_WELL float - ACS_PCT_ENGL_WELL float - ACS_PCT_ENGLISH float - ACS_PCT_HH_LIMIT_ENGLISH float - ACS_PCT_OTH_EURP float - ACS_PCT_OTH_LANG float - ACS_PCT_SPANISH float - ACS_PCT_VET float - ACS_PCT_GULFWAR_1990_2001 float - ACS_PCT_GULFWAR_2001 float - ACS_PCT_GULFWAR_VIETNAM float - ACS_PCT_VIETNAM float - ACS_MEDIAN_AGE float - ACS_MEDIAN_AGE_MALE float - ACS_MEDIAN_AGE_FEMALE float - ACS_PCT_AGE_0_4 float - ACS_PCT_AGE_5_9 float - ACS_PCT_AGE_10_14 float - ACS_PCT_AGE_15_17 float - ACS_PCT_AGE_0_17 float - ACS_PCT_AGE_18_29 float - ACS_PCT_AGE_18_44 float - ACS_PCT_AGE_30_44 float - ACS_PCT_AGE_45_64 float - ACS_PCT_AGE_50_64 float - ACS_PCT_AGE_ABOVE65 float - ACS_PCT_AGE_ABOVE80 float - ACS_PCT_AIAN float - ACS_PCT_AIAN_FEMALE float - ACS_PCT_AIAN_MALE float - ACS_PCT_AIAN_NONHISP float - ACS_PCT_ASIAN float - ACS_PCT_ASIAN_FEMALE float - ACS_PCT_ASIAN_MALE float - ACS_PCT_ASIAN_NONHISP float - ACS_PCT_BLACK float - ACS_PCT_BLACK_FEMALE float - ACS_PCT_BLACK_MALE float - ACS_PCT_BLACK_NONHISP float - ACS_PCT_HISP_FEMALE float - ACS_PCT_HISP_MALE float - ACS_PCT_HISPANIC float - ACS_PCT_MULT_RACE float - ACS_PCT_MULT_RACE_FEMALE float - ACS_PCT_MULT_RACE_MALE float - ACS_PCT_MULT_RACE_NONHISP float - ACS_PCT_NHPI float - ACS_PCT_NHPI_FEMALE float - ACS_PCT_NHPI_MALE float - ACS_PCT_NHPI_NONHISP float - ACS_PCT_OTHER_FEMALE float - ACS_PCT_OTHER_MALE float - ACS_PCT_OTHER_NONHISP float - ACS_PCT_OTHER_RACE float - ACS_PCT_WHITE float - ACS_PCT_WHITE_FEMALE float - ACS_PCT_WHITE_MALE float - ACS_PCT_WHITE_NONHISP float - ACS_PCT_HOUSEHOLDER_WHITE float - ACS_PCT_HOUSEHOLDER_BLACK float - ACS_PCT_HOUSEHOLDER_AIAN float - ACS_PCT_HOUSEHOLDER_ASIAN float - ACS_PCT_HOUSEHOLDER_NHPI float - ACS_PCT_HOUSEHOLDER_OTHER float - ACS_PCT_HOUSEHOLDER_MULT float - ACS_PCT_AIAN_COMB float - ACS_PCT_ASIAN_COMB float - ACS_PCT_BLACK_COMB float - ACS_PCT_NHPI_COMB float - ACS_PCT_WHITE_COMB float - ACS_PCT_CHILD_1FAM float - ACS_PCT_CHILDREN_GRANDPARENT float - ACS_PCT_GRANDP_RESPS_NO_P float - ACS_PCT_GRANDP_RESPS_P float - ACS_PCT_GRANDP_NO_RESPS float - ACS_PCT_HH_KID_1PRNT float - ACS_PCT_HH_NO_COMP_DEV float - ACS_PCT_HH_SMARTPHONE float - ACS_PCT_HH_SMARTPHONE_ONLY float - ACS_PCT_HH_TABLET float - ACS_PCT_HH_TABLET_ONLY float - ACS_PCT_HH_PC float - ACS_PCT_HH_PC_ONLY float - ACS_PCT_HH_OTHER_COMP float - ACS_PCT_HH_OTHER_COMP_ONLY float - ACS_PCT_HH_INTERNET float - ACS_PCT_HH_INTERNET_NO_SUBS float - ACS_PCT_HH_BROADBAND float - ACS_PCT_HH_BROADBAND_ONLY float - ACS_PCT_HH_BROADBAND_ANY float - ACS_PCT_HH_CELLULAR float - ACS_PCT_HH_CELLULAR_ONLY float - ACS_PCT_HH_NO_INTERNET float - ACS_PCT_HH_SAT_INTERNET float - ACS_PCT_HH_DIAL_INTERNET_ONLY float - ACS_PCT_DIVORCED_F float - ACS_PCT_DIVORCED_M float - ACS_PCT_MARRIED_SP_AB_F float - ACS_PCT_MARRIED_SP_AB_M float - ACS_PCT_MARRIED_SP_PR_F float - ACS_PCT_MARRIED_SP_PR_M float - ACS_PCT_NVR_MARRIED_F float - ACS_PCT_NVR_MARRIED_M float - ACS_PCT_WIDOWED_F float - ACS_PCT_WIDOWED_M float - ACS_PCT_POP_SAME_SEX_UNMRD_P float - ACS_PCT_POP_SAME_SEX_SPOUSE float - ACS_PCT_ADMIN float - ACS_PCT_ART float - ACS_PCT_CONSTRUCT float - ACS_PCT_EDUC float - ACS_PCT_FINANCE float - ACS_PCT_GOVT float - ACS_PCT_INFORM float - ACS_PCT_MANUFACT float - ACS_PCT_NATURE float - ACS_PCT_OTHER float - ACS_PCT_PROFESS float - ACS_PCT_PVT_NONPROFIT float - ACS_PCT_PVT_PROFIT float - ACS_PCT_RETAIL float - ACS_PCT_TRANSPORT float - ACS_PCT_WHOLESALE float - ACS_PCT_WORK_RES_F float - ACS_PCT_WORK_RES_M float - ACS_PCT_EMPLOYED float - ACS_PCT_UNEMPLOY float - ACS_PCT_NOT_LABOR float - ACS_PCT_VET_UNEMPL_18_64 float - ACS_PCT_VET_LABOR_FORCE_18_64 float - ACS_PCT_ARMED_FORCES float - ACS_GINI_INDEX float - ACS_MDN_GRNDPRNT_NO_PRNT_INC float - ACS_MDN_GRNDPRNT_INC float - ACS_MEDIAN_HH_INC_AIAN float - ACS_MEDIAN_HH_INC_ASIAN float - ACS_MEDIAN_HH_INC_BLACK float - ACS_MEDIAN_HH_INC_HISP float - ACS_MEDIAN_HH_INC_MULTI float - ACS_MEDIAN_HH_INC_NHPI float - ACS_MEDIAN_HH_INC_OTHER float - ACS_MEDIAN_HH_INC_WHITE float - ACS_MEDIAN_HH_INC float - ACS_MEDIAN_INC_F float - ACS_MEDIAN_INC_M float - ACS_MEDIAN_NONVET_INC float - ACS_MEDIAN_VET_INC float - ACS_PCT_INC50_ABOVE65 float - ACS_PCT_INC50_BELOW17 float - ACS_PCT_HEALTH_INC_BELOW137 float - ACS_PCT_HEALTH_INC_138_199 float - ACS_PCT_HEALTH_INC_200_399 float - ACS_PCT_HEALTH_INC_ABOVE400 float - ACS_PCT_HH_INC_10000 float - ACS_PCT_HH_INC_100000 float - ACS_PCT_HH_INC_14999 float - ACS_PCT_HH_INC_24999 float - ACS_PCT_HH_INC_49999 float - ACS_PCT_HH_INC_99999 float - ACS_PCT_INC50 float - ACS_PCT_NONVET_POV_18_64 float - ACS_PCT_VET_POV_18_64 float - ACS_PCT_PERSON_INC_100_124 float - ACS_PCT_PERSON_INC_125_199 float - ACS_PCT_PERSON_INC_ABOVE200 float - ACS_PCT_PERSON_INC_BELOW99 float - ACS_PER_CAPITA_INC float - ACS_PCT_POV_AIAN float - ACS_PCT_POV_ASIAN float - ACS_PCT_POV_BLACK float - ACS_PCT_POV_HISPANIC float - ACS_PCT_POV_MULTI float - ACS_PCT_POV_NHPI float - ACS_PCT_POV_OTHER float - ACS_PCT_POV_WHITE float - ACS_PCT_HH_1FAM_FOOD_STMP float - ACS_PCT_HH_FOOD_STMP float - ACS_PCT_HH_PUB_ASSIST float - ACS_PCT_HH_FOOD_STMP_BLW_POV float - ACS_PCT_HH_NO_FD_STMP_BLW_POV float - ACS_PCT_COLLEGE_ASSOCIATE_DGR float - ACS_PCT_BACHELOR_DGR float - ACS_PCT_NO_WORK_NO_SCHL_16_19 float - ACS_PCT_GRADUATE_DGR float - ACS_PCT_HS_GRADUATE float - ACS_PCT_LT_HS float - ACS_PCT_POSTHS_ED float - ACS_PCT_VET_BACHELOR float - ACS_PCT_VET_COLLEGE float - ACS_PCT_VET_HS float - ACS_MEDIAN_HOME_VALUE float - ACS_MEDIAN_RENT float - ACS_PCT_1UP_RENT_1ROOM float - ACS_PCT_1UP_OWNER_1ROOM float - ACS_PCT_1UP_PERS_1ROOM float - ACS_PCT_HH_1PERS float - ACS_PCT_10UNITS float - ACS_PCT_GRP_QRT float - ACS_PCT_HU_MOBILE_HOME float - ACS_PCT_OWNER_HU float - ACS_PCT_OWNER_HU_CHILD float - ACS_PCT_RENTER_HU float - ACS_PCT_RENTER_HU_ABOVE65 float - ACS_PCT_RENTER_HU_CHILD float - ACS_PCT_RENTER_HU_COST_30PCT float - ACS_PCT_RENTER_HU_COST_50PCT float - ACS_PCT_VACANT_HU float - ACS_PCT_HU_NO_FUEL float - ACS_PCT_HU_UTILITY_GAS float - ACS_PCT_HU_BOT_TANK_LP_GAS float - ACS_PCT_HU_OIL float - ACS_PCT_HU_WOOD float - ACS_PCT_HU_COAL float - ACS_PCT_HU_OTHER float - ACS_PCT_HU_ELEC float - ACS_PCT_HU_SOLAR float - ACS_MDN_OWNER_COST_MORTGAGE float - ACS_MDN_OWNER_COST_NO_MORTG float - ACS_PCT_OWNER_HU_COST_30PCT float - ACS_PCT_OWNER_HU_COST_50PCT float - ACS_MEDIAN_YEAR_BUILT float - ACS_PCT_HU_BUILT_1979 float - ACS_PCT_HU_KITCHEN float - ACS_PCT_HU_PLUMBING float - ACS_PCT_IN_STATE_MOVE float - ACS_PCT_IN_COUNTY_MOVE float - ACS_PCT_DIF_STATE float - ACS_PCT_HH_ABOVE65 float - ACS_PCT_HH_ALONE_ABOVE65 float - ACS_PCT_COMMT_15MIN float - ACS_PCT_COMMT_29MIN float - ACS_PCT_COMMT_59MIN float - ACS_PCT_COMMT_60MINUP float - ACS_PCT_HU_NO_VEH float - ACS_PCT_WORK_NO_CAR float - ACS_PCT_DRIVE_2WORK float - ACS_PCT_PUBL_TRANSIT float - ACS_PCT_PUB_COMMT_15MIN float - ACS_PCT_PUB_COMMT_29MIN float - ACS_PCT_PUB_COMMT_59MIN float - ACS_PCT_PUB_COMMT_60MINUP float - ACS_PCT_TAXICAB_2WORK float - ACS_PCT_WALK_2WORK float - ACS_PCT_MEDICAID_ANY float - ACS_PCT_MEDICAID_ANY_BELOW64 float - ACS_PCT_MEDICARE_ONLY float - ACS_PCT_OTHER_INS float - ACS_PCT_PVT_EMPL_DRCT float - ACS_PCT_PVT_EMPL_DRCT_BELOW64 float - ACS_PCT_PRIVATE_ANY float - ACS_PCT_PRIVATE_ANY_BELOW64 float - ACS_PCT_PRIVATE_EMPL float - ACS_PCT_PRIVATE_EMPL_BELOW64 float - ACS_PCT_PRIVATE_MDCR float - ACS_PCT_PRIVATE_MDCR_35_64 float - ACS_PCT_PRIVATE_OTHER float - ACS_PCT_PRIVATE_OTHER_BELOW64 float - ACS_PCT_PRIVATE_SELF float - ACS_PCT_PRIVATE_SELF_BELOW64 float - ACS_PCT_PUBLIC_ONLY float - ACS_PCT_PUBLIC_OTHER float - ACS_PCT_PUBLIC_OTHER_BELOW64 float - ACS_PCT_SELF_MDCR_ABOVE35 float - ACS_PCT_TRICARE_VA float - ACS_PCT_TRICARE_VA_BELOW64 float - ACS_PCT_UNINSURED float - ACS_PCT_UNINSURED_BELOW64 float - AHRF_USDA_RUCC_2013 float - AHRF_VET float - AHRF_VET_MALE float - AHRF_VET_FEMALE float - AHRF_UNEMPLOYED_RATE float - AHRF_DAYS_AIR_QLT float - AHRF_PCT_GOOD_AQ float - AHRF_TXC_SITE_NO_DATA float - AHRF_TXC_SITE_CNTRL float - AHRF_TXC_SITE_NO_CNTRL float - AHRF_HPSA_DENTIST float - AHRF_HPSA_MENTAL float - AHRF_HPSA_PRIM float - AHRF_HOSP_TEACHING float - AHRF_TOT_ARBRNE_ST_G_ISO_HOSPS float - AHRF_ARBRNE_ST_G_ISO_HOSPS_RATE float - AHRF_TOT_ARBRNE_ST_G_ISO_ROOMS float - AHRF_ARBRNE_ST_G_ISO_ROOMS_RATE float - AHRF_TOT_ARBRNE_STNGLT_ISO_ROOM float - AHRF_ARBRNE_STNGLT_ISO_ROOM_RATE float - AHRF_TOT_CARDIAC_IC_BEDS float - AHRF_CARDIAC_IC_BEDS_RATE float - AHRF_TOT_CARDIAC_IC_HOSP float - AHRF_CARDIAC_IC_HOSP_RATE float - AHRF_TOT_COM_HEALTH_GRANT float - AHRF_COM_HEALTH_GRANT_RATE float - AHRF_TOT_ER_VST_ST_G_HOSP float - AHRF_TOT_HOSP_ADMISSIONS float - AHRF_TOT_HOSP_BED float - AHRF_HOSP_BED_RATE float - AHRF_TOT_HOSP_BEDS_LT float - AHRF_HOSP_BEDS_LT_RATE float - AHRF_TOT_HOSP_MOBILE float - AHRF_HOSP_MOBILE_RATE float - AHRF_TOT_HOSP_TELE_ICU float - AHRF_HOSP_TELE_ICU_RATE float - AHRF_TOT_HOSP_TELE_STROKE float - AHRF_HOSP_TELE_STROKE_RATE float - AHRF_TOT_LT_HOSP float - AHRF_LT_HOSP_RATE float - AHRF_TOT_MEDSURGIC_BEDS float - AHRF_MEDSURGIC_BEDS_RATE float - AHRF_TOT_MEDSURGIC_HOSP float - AHRF_MEDSURGIC_HOSP_RATE float - AHRF_TOT_NEONATALIC_BEDS float - AHRF_NEONATALIC_BEDS_RATE float - AHRF_TOT_NEONATALIC_HOSP float - AHRF_NEONATALIC_HOSP_RATE float - AHRF_TOT_NH_BED_STNGH float - AHRF_NH_BED_STNGH_RATE float - AHRF_TOT_NHSC_ACTIVE float - AHRF_NHSC_ACTIVE_RATE float - AHRF_TOT_NHSC_FTE_PROV float - AHRF_NHSC_FTE_PROV_RATE float - AHRF_TOT_OPRT_ROOM float - AHRF_OPRT_ROOM_RATE float - AHRF_TOT_RURL_REFRRL_CNT float - AHRF_RURL_REFRRL_CNT_RATE float - AHRF_TOT_ST_COMM_HOSP float - AHRF_ST_COMM_HOSP_RATE float - AHRF_TOT_ST_G_HOSP float - AHRF_ST_G_HOSP_RATE float - AHRF_TOT_ST_G_HOSP_BED float - AHRF_ST_G_HOSP_BED_RATE float - AHRF_TOT_ST_N_G_HOSP float - AHRF_ST_N_G_HOSP_RATE float - AHRF_TOT_HOSPS float - AHRF_HOSPS_RATE float - AHRF_MARKET_ENROL float - AHRF_MARKET_ENROL_NEW float - AHRF_MARKET_ENROL_ACTIVE float - AHRF_MARKET_ENROL_AUTO float - AHRF_MARKET_ENROL_150 float - AHRF_MARKET_ENROL_200 float - AHRF_MARKET_ENROL_250 float - AHRF_MARKET_ENROL_300 float - AHRF_MARKET_ENROL_400 float - AHRF_MARKET_ENROL_OTHER float - AHRF_MARKET_ENROL_NO_ASST float - AHRF_PRESC_ENROLLMENT float - AHRF_PCT_PRESC_PEN float - AHRF_TOT_ADV_NURSES float - AHRF_ADV_NURSES_RATE float - AHRF_TOT_ALLERGY_IMM float - AHRF_ALLERGY_IMM_RATE float - AHRF_TOT_ANESTH float - AHRF_ANESTH_RATE float - AHRF_TOT_CHLD_PSYCH float - AHRF_CHLD_PSYCH_RATE float - AHRF_TOT_CLIN_NURSE_SPEC float - AHRF_CLIN_NURSE_SPEC_RATE float - AHRF_TOT_DENTISTS float - AHRF_DENTISTS_RATE float - AHRF_TOT_GEN_PREV float - AHRF_GEN_PREV_RATE float - AHRF_TOT_NURSE_ANESTH float - AHRF_NURSE_ANESTH_RATE float - AHRF_TOT_NURSE_MIDWIVES float - AHRF_NURSE_MIDWIVES_RATE float - AHRF_TOT_NURSE_PRACT float - AHRF_NURSE_PRACT_RATE float - AHRF_TOT_OB_GYN float - AHRF_OB_GYN_RATE float - AHRF_TOT_OPHTHALMOLOGY float - AHRF_OPHTHALMOLOGY_RATE float - AHRF_TOT_ORTH_SURG float - AHRF_ORTH_SURG_RATE float - AHRF_TOT_OTHER_SPEC float - AHRF_OTHER_SPEC_RATE float - AHRF_TOT_OTOLARYNGOLOGY float - AHRF_OTOLARYNGOLOGY_RATE float - AHRF_TOT_PEDIATRICS float - AHRF_PEDIATRICS_RATE float - AHRF_TOT_PHYSICIAN_ASSIST float - AHRF_PHYSICIAN_ASSIST_RATE float - AHRF_TOT_PHYS_PRIMARY float - AHRF_PHYS_PRIMARY_RATE float - AHRF_TOT_PLASTIC_SURG float - AHRF_PLASTIC_SURG_RATE float - AHRF_TOT_PULMONARY_SPEC float - AHRF_PULMONARY_SPEC_RATE float - AHRF_TOT_PSYCH float - AHRF_PSYCH_RATE float - AHRF_TOT_RADI float - AHRF_RADI_RATE float - AHRF_MCR_BN_READM_RATE float - AHRF_TOT_SURG_SPECS float - AHRF_SURG_SPECS_RATE float - AHRF_TOT_THORACIC_SURG float - AHRF_THORACIC_SURG_RATE float - AHRF_TOT_MDS float - AHRF_MDS_RATE float - AHRF_TOT_UROL float - AHRF_UROL_RATE float - AHRF_IP_DAY_NH_HOSP_RATE float - AHRF_IP_DAY_ST_G_HOSP_RATE float - AHRF_IP_DAY_STNG_LT_RATE float - AHRF_TOT_MCR_BN_READM float - AHRF_TOT_MCR_IP_DAY_ST_G float - AHRF_TOT_MDCD_IP_DAY_ST_G float - AHRF_TOT_MDCR_FFS_ACT_COST float - AHRF_TOT_MDCR_FFS_STD_COST float - AHRF_TOT_CARDIOVAS_SPEC float - AHRF_CARDIOVAS_SPEC_RATE float - AHRF_TOT_COLON_SRG float - AHRF_COLON_SRG_RATE float - AHRF_TOT_DERMATOLOGY float - AHRF_DERMATOLOGY_RATE float - AHRF_TOT_ER_MED float - AHRF_ER_MED_RATE float - AHRF_TOT_GASTROENTEROLOGY float - AHRF_GASTROENTEROLOGY_RATE float - AHRF_TOT_GEN_INTERNAL_MED float - AHRF_GEN_INTERNAL_MED_RATE float - AHRF_TOT_GENRL_SURG float - AHRF_GENRL_SURG_RATE float - AHRF_TOT_MED_SPEC float - AHRF_MED_SPEC_RATE float - AHRF_TOT_NEUROLOGICAL_SURG float - AHRF_NEUROLOGICAL_SURG_RATE float - AHRF_TOT_N_ST_G_EXP float - AHRF_TOT_N_ST_G_EXP_1000 float - AHRF_TOT_N_ST_G_PAYRLL float - AHRF_TOT_N_ST_G_PAYRLL_1000 float - AHRF_OP_VST_LT_ER_OP_RATE float - AHRF_OP_VST_ST_G_ER_OP_RATE float - AHRF_OP_VST_ST_G_OTHR_RATE float - AHRF_TOT_OUTPAT_VST_STNGH float - AHRF_TOT_ST_COMM_HOSP_ADMS float - AHRF_TOT_ST_G_HOSP_ADMS float - AHRF_TOT_STNG_LT_HOSP_ADMS float - AHRF_SURG_OPRN_IP_RATE float - AHRF_SURG_OPRN_OP_RATE float - AMFAR_SSP float - AMFAR_SSP_RATE float - AMFAR_TOT_MEDSAFAC float - AMFAR_MEDSAFAC_RATE float - AMFAR_TOT_RWFAC float - AMFAR_RWFAC_RATE float - AMFAR_TOT_AMATFAC float - AMFAR_AMATFAC_RATE float - AMFAR_TOT_HCVTFAC float - AMFAR_HCVTFAC_RATE float - AMFAR_TOT_HIVHCVTFAC float - AMFAR_HIVHCVTFAC_RATE float - AMFAR_TOT_HIVTFAC float - AMFAR_HIVTFAC_RATE float - AMFAR_TOT_MEDAMATFAC float - AMFAR_MEDAMATFAC_RATE float - AMFAR_TOT_MEDHCVTFAC float - AMFAR_MEDHCVTFAC_RATE float - AMFAR_TOT_MEDHIVHCVTFAC float - AMFAR_MEDHIVHCVTFAC_RATE float - AMFAR_TOT_MEDHIVTFAC float - AMFAR_MEDHIVTFAC_RATE float - AMFAR_TOT_MEDMHFAC float - AMFAR_MEDMHFAC_RATE float - AMFAR_TOT_MHFAC float - AMFAR_MHFAC_RATE float - CAF_ADJ_COUNTY_1 float - CAF_ADJ_COUNTY_2 float - CAF_ADJ_COUNTY_3 float - CAF_ADJ_COUNTY_4 float - CAF_ADJ_COUNTY_5 float - CAF_ADJ_COUNTY_6 float - CAF_ADJ_COUNTY_7 float - CAF_ADJ_COUNTY_8 float - CAF_ADJ_COUNTY_9 float - CAF_ADJ_COUNTY_10 float - CAF_ADJ_COUNTY_11 float - CAF_ADJ_COUNTY_12 float - CAF_ADJ_COUNTY_13 float - CAF_ADJ_COUNTY_14 float - CCBP_ANNUAL_TOT_POP float - CCBP_LARGE_INDUSTRY float - CCBP_PCT_HEALTH_EMPLOYMENT float - CCBP_BWLSTORES_RATE float - CCBP_GAMBLING_RATE float - CCBP_FCRSC_RATE float - CCBP_CHS_RATE float - CCBP_CFS_RATE float - CCBP_SFS_RATE float - CCBP_EORS_RATE float - CCBP_SHELTERS_RATE float - CCBP_PHYS_RATE float - CCBP_LAB_RATE float - CCBP_RET_RATE float - CCBP_HOME_RATE float - CCBP_CHILD_RATE float - CCBP_SA_RATE float - CCBP_CS_RATE float - CCBP_FF_RATE float - CCBP_FSR_RATE float - CCBP_SOGS_RATE float - CDCA_HEART_DTH_RATE_ABOVE35 float - CDCA_STROKE_DTH_RATE_ABOVE35 float - CDCA_PREV_DTH_RATE_BELOW74 float - CDCA_PREV_DTH_HISP_RATE_BELOW74 float - CDCA_PREV_DTH_BLACK_RATE_BELOW74 float - CDCA_PREV_DTH_WHITE_RATE_BELOW74 float - CDCAP_HIV_RATE_ABOVE13 float - CDCAP_HIVDIAG_RATE_ABOVE13 float - CDCAP_HIVDIAG_M_RATE_ABOVE13 float - CDCAP_HIVDIAG_F_RATE_ABOVE13 float - CDCAP_HIVDIAG_WHT_RATE_ABOVE13 float - CDCAP_HIVDIAG_BLK_RATE_ABOVE13 float - CDCAP_HIVDIAG_HIS_RATE_ABOVE13 float - CDCAP_HIVDIAG_ASN_RATE_ABOVE13 float - CDCAP_HIVDIAG_MTR_RATE_ABOVE13 float - CDCAP_HIVDIAG_AIAN_RATE_ABOVE13 float - CDCAP_HIVDIAG_NHPI_RATE_ABOVE13 float - CDCAP_CHLAMYDIA_RATE float - CDCAP_GONORRHEA_RATE float - CDCAP_SYPHILIS_RATE float - CDCAP_TUBERCULOSIS_RATE float - CDCW_TOT_POPULATION float - CDCW_INJURY_DTH_RATE float - CDCW_TRANSPORT_DTH_RATE float - CDCW_SELFHARM_DTH_RATE float - CDCW_ASSAULT_DTH_RATE float - CDCW_MATERNAL_DTH_RATE float - CDCW_OPIOID_DTH_RATE float - CDCW_DRUG_DTH_RATE float - CEN_AREALAND_SQM_COUNTY float - CEN_POPDENSITY_COUNTY float - CHR_SEGREG_BLACK float - CHR_SEGREG_NON_WHITE float - CHR_FIREARM_DEATH_RATE float - CHR_TOT_MENTAL_PROV float - CHR_MENTAL_PROV_RATE float - CHR_TOT_POPULATION float - CHR_TEEN_BIRTH_RATE_15_19 float - CHR_PCT_LOW_BIRTH_WT float - CHR_AVG_LIFE_EXPEC float - CHR_CHILD_DEATH_RATE float - CHR_INFANT_DEATH_RATE float - CHR_PCT_ALCOHOL_DRIV_DEATH float - CHR_PREMAT_DEATH_RATE float - MP_MEDICARE_ELIGIBLES float - MP_MEDICARE_ADVTG_ENROLLED float - MP_PCT_ADVTG_PEN float - NCHS_URCS_2006 float - NCHS_URCS_2013 float - NEPHTN_ARSENIC_MEAN_CSW float - NEPHTN_ARSENIC_MEAN_POP float - NEPHTN_PCT_ARSENIC_MCL_NOTDETECT float - NEPHTN_PCT_ARSENIC_MCL_LESS10 float - NEPHTN_PCT_ARSENIC_MCL_GREATER10 float - NEPHTN_HEATIND_90 float - NEPHTN_HEATIND_95 float - NEPHTN_HEATIND_100 float - NEPHTN_HEATIND_105 float - NEPHTN_MAXDROUGHT float - NEPHTN_NUMDROUGHT float - NEPHTN_TEMPERATURE_90 float - NEPHTN_TEMPERATURE_95 float - NEPHTN_TEMPERATURE_100 float - NEPHTN_TEMPERATURE_105 float - NHC_AVG_LIC_STAFF float - NHC_AVG_REP_NURSE_STAFF float - NHC_AVG_ADJ_NURSE_STAFF float - NHC_TOT_FACS float - NHC_FACS_RATE float - CCD_FED_REVENUE_CNA float - CCD_LOCAL_REVENUE_LUNCH float - CCD_STATE_REVENUE_LUNCH float - CCD_TOT_EXPENDITURE float - CCD_TOT_FED_REVENUE float - CCD_TOT_LOCAL_REVENUE float - CCD_TOT_REVENUE float - CCD_TOT_STATE_REVENUE float - CCD_TOT_STUDENTS float - CDCP_ARTHRITIS_ADULT_A float - CDCP_ARTHRITIS_ADULT_C float - CDCP_ASTHMA_ADULT_A float - CDCP_ASTHMA_ADULT_C float - CDCP_BLOOD_MED_ADULT_A float - CDCP_BLOOD_MED_ADULT_C float - CDCP_CHOLES_ADULT_A float - CDCP_CHOLES_ADULT_C float - CDCP_CHOLES_SCR_ADULT_A float - CDCP_CHOLES_SCR_ADULT_C float - CDCP_DOCTOR_VISIT_ADULT_A float - CDCP_DOCTOR_VISIT_ADULT_C float - CDCP_KIDNEY_DISEASE_ADULT_A float - CDCP_KIDNEY_DISEASE_ADULT_C float - CDCP_NO_PHY_ACTV_ADULT_A float - CDCP_NO_PHY_ACTV_ADULT_C float - CDCP_PULMONARY_ADULT_A float - CDCP_PULMONARY_ADULT_C float - CRE_RATE_RISK0 float - CRE_RATE_RISK12 float - CRE_RATE_RISK3 float - CRE_TOT_POP_DENOM float - CRE_TOT_RISK0 float - CRE_TOT_RISK12 float - CRE_TOT_RISK3 float - EPAA_2NDMAX_CO_1HR float - EPAA_2NDMAX_CO_8HR float - EPAA_98PR_NO2_1HR float - EPAA_MEAN_NO2_1HR float - EPAA_2NDMAX_O3_1HR float - EPAA_4THMAX_O3_8HR float - EPAA_MAX_PB_3MON float - EPAA_2NDMAX_PM10_24HR float - EPAA_MEAN_WTD_PM10 float - EPAA_MEAN_WTD_PM25 float - EPAA_98PR_PM25_DAILY float - EPAA_99PR_SO2_1HR float - EPAA_2NDMAX_SO2_24HR float - EPAA_MEAN_SO2_1HR float - NOAAC_AVG_TEMP_APR float - NOAAC_AVG_TEMP_AUG float - NOAAC_AVG_TEMP_DEC float - NOAAC_AVG_TEMP_FEB float - NOAAC_AVG_TEMP_JAN float - NOAAC_AVG_TEMP_JUL float - NOAAC_AVG_TEMP_JUN float - NOAAC_AVG_TEMP_MAR float - NOAAC_AVG_TEMP_MAY float - NOAAC_AVG_TEMP_NOV float - NOAAC_AVG_TEMP_OCT float - NOAAC_AVG_TEMP_SEP float - NOAAC_MAX_TEMP_APR float - NOAAC_MAX_TEMP_AUG float - NOAAC_MAX_TEMP_DEC float - NOAAC_MAX_TEMP_FEB float - NOAAC_MAX_TEMP_JAN float - NOAAC_MAX_TEMP_JUL float - NOAAC_MAX_TEMP_JUN float - NOAAC_MAX_TEMP_MAR float - NOAAC_MAX_TEMP_MAY float - NOAAC_MAX_TEMP_NOV float - NOAAC_MAX_TEMP_OCT float - NOAAC_MAX_TEMP_SEP float - NOAAC_MIN_TEMP_APR float - NOAAC_MIN_TEMP_AUG float - NOAAC_MIN_TEMP_DEC float - NOAAC_MIN_TEMP_FEB float - NOAAC_MIN_TEMP_JAN float - NOAAC_MIN_TEMP_JUL float - NOAAC_MIN_TEMP_JUN float - NOAAC_MIN_TEMP_MAR float - NOAAC_MIN_TEMP_MAY float - NOAAC_MIN_TEMP_NOV float - NOAAC_MIN_TEMP_OCT float - NOAAC_MIN_TEMP_SEP float - NOAAC_PRECIPITATION_APR float - NOAAC_PRECIPITATION_AUG float - NOAAC_PRECIPITATION_DEC float - NOAAC_PRECIPITATION_FEB float - NOAAC_PRECIPITATION_JAN float - NOAAC_PRECIPITATION_JUL float - NOAAC_PRECIPITATION_JUN float - NOAAC_PRECIPITATION_MAR float - NOAAC_PRECIPITATION_MAY float - NOAAC_PRECIPITATION_NOV float - NOAAC_PRECIPITATION_OCT float - NOAAC_PRECIPITATION_SEP float - NOAAS_PROPERTY_DAMAGE float - NOAAS_TOT_DEATHS_DIRECT float - NOAAS_TOT_DEATHS_INDIRECT float - NOAAS_TOT_INJURIES_DIRECT float - NOAAS_TOT_INJURIES_INDIRECT float - NOAAS_TOT_STORMEVENT float - NOAAS_TOT_TORNADO float - NOAAS_TOT_WIND float - NOAAS_TOT_HAIL float - NOAAS_TOT_HURRICANE_STORM float - NOAAS_TOT_FLOOD float - NOAAS_TOT_WILDFIRE float - NOAAS_TOT_HEAT_EVENTS float - NOAAS_TOT_DROUGHT float - SAHIE_PCT_UNINSURED64 float - SAHIE_PCT_UNINSURED64_138_400FPL float - SAHIE_PCT_UNINSURED64_138FPL float - SAHIE_PCT_UNINSURED64_200FPL float - SAHIE_PCT_UNINSURED64_250FPL float - SAHIE_PCT_UNINSURED64_400FPL float - SAHIE_TOT_POP64 float - SAIPE_MEDIAN_HH_INCOME float - SAIPE_PCT_POV float - SAIPE_PCT_POV_0_17 float - SAIPE_PCT_POV_5_17 float - SAIPE_TOT_POV float - SAIPE_TOT_POV_0_17 float - SAIPE_TOT_POV_5_17 float - WUSTL_AVG_PM25 float - AHA_HHI_SHRTTRM_ACUTE_DSCHR_CBSA float - AHA_HHI_SHRTTRM_ACUTE_ADMSN_CBSA float - AHA_HHI_SHRTTRM_ACUTE_LOS_CBSA float - AHA_HHI_SHRTTRM_ACUTE_DSCHR_CTY float - AHA_HHI_SHRTTRM_ACUTE_ADMSN_CTY float - AHA_HHI_SHRTTRM_ACUTE_LOS_CTY float - HHC_PCT_HHA_NURSING float - HHC_PCT_HHA_PHYS_THERAPY float - HHC_PCT_HHA_OCC_THERAPY float - HHC_PCT_HHA_SPEECH float - HHC_PCT_HHA_MEDICAL float - HHC_PCT_HHA_AIDE float - HRSA_MUA_COUNTY float - LTC_AVG_PCT_PRESSURE_ULCER float - LTC_PCT_RESD_BLACK float - LTC_PCT_RESD_HISPANIC float - LTC_PCT_RESD_WHITE float - LTC_AVG_AGE float - LTC_PCT_MULTI_FAC float - LTC_PCT_FOR_PROFIT float - LTC_AVG_ACUITY_INDEX float - LTC_TOT_BEDS float - LTC_TOT_BEDS_RATE float - LTC_AVG_OBS_REHOSP_RATE float - LTC_AVG_OBS_SUCCESSFUL_DISC_RATE float - LTC_AVG_OBS_MEDIAN_LOS float - LTC_AVG_PCT_MEDICAID float - LTC_AVG_PCT_MEDICARE float - LTC_OCCUPANCY_RATE float - MGV_PCT_MEDICAID float - MGV_TOT_BEN_PART_A_B float - MGV_TOT_BEN_FFS float - MGV_PCT_BEN_FFS_WHITE float - MGV_PCT_BEN_FFS_BLACK float - MGV_PCT_BEN_FFS_HISPANIC float - MGV_PER_CAPITA_ACTUAL_IP float - MGV_PER_CAPITA_STD_IP float - MGV_PER_CAPITA_ACTUAL_OP float - MGV_PER_CAPITA_STD_OP float - MGV_PER_CAPITA_ACTUAL_EM float - MGV_PER_CAPITA_STD_EM float - MGV_PER_CAPITA_ACTUAL_PA float - MGV_PER_CAPITA_STD_PA float - MGV_PER_CAPITA_ACTUAL_HC float - MGV_PER_CAPITA_STD_HC float - MMD_OVERALL_PQI_M_RATE float - MMD_OVERALL_PQI_F_RATE float - MMD_OVERALL_PQI_WHITE_RATE float - MMD_OVERALL_PQI_BLACK_RATE float - MMD_OVERALL_PQI_OTHER_RATE float - MMD_OVERALL_PQI_ASIAN_RATE float - MMD_OVERALL_PQI_HISP_RATE float - MMD_OVERALL_PQI_AIAN_RATE float - MMD_ACUTE_PQI_M_RATE float - MMD_ACUTE_PQI_F_RATE float - MMD_ACUTE_PQI_WHITE_RATE float - MMD_ACUTE_PQI_BLACK_RATE float - MMD_ACUTE_PQI_OTHER_RATE float - MMD_ACUTE_PQI_ASIAN_RATE float - MMD_ACUTE_PQI_HISP_RATE float - MMD_ACUTE_PQI_AIAN_RATE float - MMD_CHRONIC_PQI_M_RATE float - MMD_CHRONIC_PQI_F_RATE float - MMD_CHRONIC_PQI_WHITE_RATE float - MMD_CHRONIC_PQI_BLACK_RATE float - MMD_CHRONIC_PQI_OTHER_RATE float - MMD_CHRONIC_PQI_ASIAN_RATE float - MMD_CHRONIC_PQI_HISP_RATE float - MMD_CHRONIC_PQI_AIAN_RATE float - MMD_ED_VISITS_M_RATE float - MMD_ED_VISITS_F_RATE float - MMD_ED_VISITS_WHITE_RATE float - MMD_ED_VISITS_BLACK_RATE float - MMD_ED_VISITS_OTHER_RATE float - MMD_ED_VISITS_ASIAN_RATE float - MMD_ED_VISITS_HISP_RATE float - MMD_ED_VISITS_AIAN_RATE float - MMD_ED_VISITS_MED_RATE float - MMD_ED_VISITS_DUAL_RATE float - MMD_READM_M_RATE float - MMD_READM_F_RATE float - MMD_READM_WHITE_RATE float - MMD_READM_BLACK_RATE float - MMD_READM_OTHER_RATE float - MMD_READM_ASIAN_RATE float - MMD_READM_HISP_RATE float - MMD_READM_AIAN_RATE float - MMD_ANXIETY_DISD float - MMD_BIPOLAR_DISD float - MMD_DEPR_DISD float - MMD_PERSONALITY_DISD float - MMD_OUD_IND float - MMD_THREE_OR_MORE_COND float - PC_PCT_MEDICARE_APPRVD_FULL_AMT float - PC_PCT_MCARE_MAY_ACPT_APPRVD_AMT float - POS_MEDIAN_DIST_ED float - POS_MEAN_DIST_ED float - POS_MIN_DIST_ED float - POS_MAX_DIST_ED float - POS_MEDIAN_DIST_MEDSURG_ICU float - POS_MEAN_DIST_MEDSURG_ICU float - POS_MIN_DIST_MEDSURG_ICU float - POS_MAX_DIST_MEDSURG_ICU float - POS_MEDIAN_DIST_TRAUMA float - POS_MEAN_DIST_TRAUMA float - POS_MIN_DIST_TRAUMA float - POS_MAX_DIST_TRAUMA float - POS_MEDIAN_DIST_PED_ICU float - POS_MEAN_DIST_PED_ICU float - POS_MIN_DIST_PED_ICU float - POS_MAX_DIST_PED_ICU float - POS_MEDIAN_DIST_OBSTETRICS float - POS_MEAN_DIST_OBSTETRICS float - POS_MIN_DIST_OBSTETRICS float - POS_MAX_DIST_OBSTETRICS float - POS_MEDIAN_DIST_CLINIC float - POS_MEAN_DIST_CLINIC float - POS_MIN_DIST_CLINIC float - POS_MAX_DIST_CLINIC float - POS_MEDIAN_DIST_ALC float - POS_MEAN_DIST_ALC float - POS_MIN_DIST_ALC float - POS_MAX_DIST_ALC float - POS_TOT_FQHC float - POS_FQHC_RATE float - POS_TOT_CMHC float - POS_CMHC_RATE float - POS_TOT_RHC float - POS_RHC_RATE float - POS_TOT_HHA float - POS_HHA_RATE float - POS_TOT_HOSPICE float - POS_HOSPICE_RATE float - POS_TOT_ASC float - POS_ASC_RATE float - POS_TOT_NF float - POS_NF_RATE float - POS_TOT_NF_BEDS float - POS_NF_BEDS_RATE float - POS_TOT_SNF float - POS_SNF_RATE float - POS_TOT_SNF_BEDS float - POS_SNF_BEDS_RATE float - POS_TOT_HOSP_OBSTETRIC float - POS_HOSP_OBSTETRIC_RATE float - POS_TOT_HOSP_PED_ICU float - POS_HOSP_PED_ICU_RATE float - POS_TOT_HOSP_BURN float - POS_HOSP_BURN_RATE float - POS_TOT_HOSP_MEDSURG_ICU float - POS_HOSP_MEDSURG_ICU_RATE float - POS_TOT_HOSP_REHAB float - POS_HOSP_REHAB_RATE float - POS_TOT_HOSP_ALC float - POS_HOSP_ALC_RATE float - POS_TOT_HOSP_PSYCH float - POS_HOSP_PSYCH_RATE float - POS_TOT_HOSP_AMBULANCE float - POS_HOSP_AMBULANCE_RATE float - POS_TOT_HOSP_CHEMO float - POS_HOSP_CHEMO_RATE float - POS_TOT_HOSP_ED float - POS_HOSP_ED_RATE float - POS_PCT_HOSP_FOR_PROFIT float - POS_PCT_HOSP_NON_PROFIT float - POS_PCT_HOSP_GOV float - CEN_AIAN_NH_IND float preprocess: kind: sql engine: spark query: | select to_date(YEAR, "yyyy") as event_time, * from input merge: kind: ledger primaryKey: - YEAR - COUNTYFIPS ```
sergiimk commented 1 year ago

May not be a Spark issue after all, as reading CSV with schema inference using PySpark via kamu notebook succeeds:

image

sergiimk commented 5 months ago

We no longer use Spark at the ingest phase