saurfang / sparksql-protobuf

Read SparkSQL parquet file as RDD[Protobuf]
http://spark-packages.org/package/saurfang/sparksql-protobuf
Apache License 2.0
93 stars 36 forks source link

how to query on SPARKSQL #11

Open heenasalim opened 5 years ago

heenasalim commented 5 years ago

Hi Team,

I am trying following SQL:-

var work__store_level_vend_pack_loc_final_data =

  sparksession.read.format("csv")
 .option("header", "true")
 .option("delimiter", "|")
 .option("inferSchema", "true")
 .load("C:\\Users\\jabin\\Desktop\\project_files\\work__store_level_vend_pack_loc_final_data.txt");

  work__store_level_vend_pack_loc_final_data.registerTempTable("work__store_level_vend_pack_loc_final_data_table");

var r1 = sparksession.sqlContext.sql( "SELECT shc_item_id ,'K' as source_owner_cd,item_purchase_status_cd, vendor_package_id,vendor_package_purchase_status_cd,flow_type_cd as vendor_package_flow_type_cd,vendor_carton_qty,vendor_stock_nbr,ksn_package_id,ksn_purchase_status_cd,import_ind,sears_divission_nbr,sears_item_nbr,sears_sku_nbr,scan_based_trading_ind,cross_merchandising_cd,retail_carton_vendor_package_id,vendor_package_owner_cd,can_carry_model_id,'' AS days_to_check_begin_day_qty,'' AS days_to_check_end_day_qty ,dotcom_allocation_ind ,retail_carton_internal_package_qty,allocation_replenishment_cd,shc_item_type_cd,idrp_order_method_cd,source_package_qty as store_source_package_qty,order_duns_nbr FROM work__store_level_vend_pack_loc_final_data_table WHERE flow_type_cd = 'JIT' OR servicing_dc_nbr > '0' ") // .collect.foreach(println)

now i want to distinct all column of the r1 using sparksession.sqlContext.sql("") how to do above thing?

heenasalim commented 5 years ago

gold_item_aprk_current_data.txt work__store_level_vend_pack_loc_final_data.txt