Closed kevinguard closed 6 years ago
It is possible that diff
string has some special format which breaks the parseDiff
function, would you mind providing the diff here?
@otakustay Thank you for your prompt response. Please find the following as the diff string I am passing to your API:
--- a
+++ b
@@ -1,4 +1,4 @@
{
- "versionId": 1520467766093787,
+ "versionId": 1520273259522895,
"uuid": "4e7fb4cf-8d50-440a-9d75-1ecf356082aa",
"isDeleted": false,
@@ -131,5 +131,5 @@
"dependencies": [],
"category": "Hive",
- "hql": "use {{database}};\nset mapreduce.job.queuename={{queuename}};\n\nset hive.execution.engine=MR;\nSET hive.exec.compress.output=true;\nSET mapred.output.compression.type=BLOCK;\nSET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;\nset hive.auto.convert.join = true;\nset hive.auto.convert.join.noconditionaltask = true;\nset hive.auto.convert.join.noconditionaltask.size = 10000000;\nSET hive.exec.parallel=true;\nset hive.exec.dynamic.partition=true;\nset hive.exec.dynamic.partition.mode=nonstrict;\nset hive.exec.reducers.bytes.per.reducer=536870912;\nset hive.exec.reducers.max=500;\n\nset mapred.min.split.size=67108864;\n\nSET hive.exec.compress.intermediate=true;\nset hive.exec.max.dynamic.partitions.pernode=10000;\nset hive.exec.max.dynamic.partitions=10000;\n\n-- create dispatch tagged users table\nCREATE TABLE IF NOT EXISTS {{dispatch_rider_cancel_daily_table_name}} (\n job_client_uuid string,\n client_vvid string,\n supply_vvid string,\n supply_client_vvid string,\n job_uuid string,\n supply_uuid string,\n vehicle_view_id string,\n city_id string,\n client_uuid string,\n timestamp bigint,\n day_of_week bigint,\n hour_of_day bigint,\n dispatched_eta float,\n app_rating float,\n eyeball_eta_seconds bigint,\n eta_delta bigint,\n surge_multiplier float,\n is_forward_dispatched boolean,\n request_location_latitude float,\n request_location_longitude float,\n dropoff_location_latitude float,\n dropoff_location_longitude float,\n trip_distance_haversine float,\n days_since_signup bigint,\n rider_upfront_fare float,\n is_commute boolean,\n is_rider_canceled_dispatch boolean,\n is_fifo boolean\n)\nPARTITIONED BY (datestr string)\nROW FORMAT SERDE\n 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'\nSTORED AS INPUTFORMAT\n 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'\nOUTPUTFORMAT\n 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'\nTBLPROPERTIES ('dc_replication' = 'false')\n;\n\nWITH a AS\n(\n SELECT a.*,\n ACOS(COS(RADIANS(90-pickup.waypointLocation.latitude))*COS(RADIANS(90-dropoff.waypointLocation.latitude))+\n SIN(RADIANS(90-pickup.waypointLocation.latitude))*SIN(RADIANS(90-dropoff.waypointLocation.latitude))\n *COS(RADIANS(pickup.waypointLocation.longitude-dropoff.waypointLocation.longitude)))*3958.756 AS trip_distance_haversine,\n dropoff.waypointLocation.latitude AS dropoff_location_latitude,\n dropoff.waypointLocation.longitude AS dropoff_location_longitude\n FROM rawdata.kafka_hp_demand_job_assigned_nodedup AS a\n LATERAL VIEW EXPLODE(a.msg.waypoints) exploded AS pickup\n LATERAL VIEW EXPLODE(a.msg.waypoints) exploded2 AS dropoff\n WHERE a.datestr = '{{yesterday_ds}}'\n AND pickup.waypointtasktype = 'pickup'\n AND dropoff.waypointtasktype = 'dropoff'\n)\nINSERT OVERWRITE TABLE {{dispatch_rider_cancel_daily_table_name}} PARTITION(datestr)\nSELECT DISTINCT\n concat(a.msg.jobUUID, a.msg.clientUUID) AS job_client_uuid,\n concat(a.msg.clientUUID, '#', a.msg.vehicleViewId) AS client_vvid,\n concat(a.msg.supplyUUID, '#', a.msg.vehicleViewId) AS supply_vvid,\n concat(a.msg.supplyUUID, '#', a.msg.clientUUID, '#', a.msg.vehicleViewId) AS supply_client_vvid,\n a.msg.jobUUID AS job_uuid,\n a.msg.supplyUUID AS supply_uuid,\n a.msg.vehicleViewId AS vehicle_view_id,\n a.msg.region.id AS city_id,\n a.msg.clientUUID AS client_uuid,\n cast(a.msg.timestamp/1e3 as bigint) as timestamp,\n from_unixtime(cast(a.msg.timestamp/1e3 as bigint),'u') as day_of_week,\n from_unixtime(cast(a.msg.timestamp/1e3 as bigint),'H') as hour_of_day,\n a.msg.predictedETA AS dispatched_eta,\n dd.app_rating AS app_rating,\n a.msg.eyeballETASeconds AS eyeball_eta_seconds,\n a.msg.predictedETA - a.msg.eyeballETASeconds AS eta_delta,\n a.msg.surgeMultiplier AS surge_multiplier,\n a.msg.isForwardDispatched AS is_forward_dispatched,\n a.msg.requestLocation.latitude AS request_location_latitude,\n a.msg.requestLocation.longitude AS request_location_longitude,\n a.dropoff_location_latitude,\n a.dropoff_location_longitude,\n a.trip_distance_haversine,\n GREATEST(DATEDIFF(TO_DATE(FROM_UNIXTIME(cast(a.msg.timestamp/1e3 as bigint))), TO_DATE(dim_client.signup_timestamp)), 0) AS days_since_signup,\n a.msg.upfrontFareInfo.riderUpfrontFareAmount AS rider_upfront_fare,\n a.msg.isCommuterBenefits AS is_commute,\n CASE WHEN rc.msg.jobUUID IS NOT NULL THEN 1 ELSE 0 END AS is_rider_canceled_dispatch,\n a.msg.fifoDispatchType = 'fifo' as is_fifo,\n '{{yesterday_ds}}' AS datestr\nFROM a\n LEFT JOIN rawdata.kafka_hp_demand_job_canceled_nodedup AS rc\n ON a.msg.jobUUID = rc.msg.jobUUID\n AND rc.datestr = '{{yesterday_ds}}'\n LEFT JOIN rawdata.kafka_hp_demand_job_unassigned_nodedup AS du\n ON a.msg.jobUUID = du.msg.jobUUID\n AND du.datestr = '{{yesterday_ds}}'\n AND a.msg.supplyUUID = du.msg.supplyUUID\n INNER JOIN dwh.dim_driver AS dd\n ON a.msg.supplyUUID = dd.driver_uuid\n AND (NOT dd.is_uber_email)\n INNER JOIN dwh.dim_client AS dim_client\n ON a.msg.clientUUID = dim_client.user_uuid\n AND (NOT dim_client.is_uber_email)\nWHERE du.msg.jobUUID IS NULL\n;\n\nALTER TABLE {{dispatch_rider_cancel_daily_table_name}} DROP IF EXISTS PARTITION(datestr='{{macros.ds_add(ds,-40)}}')\n;",
+ "hql": "use {{database}};\nset mapreduce.job.queuename={{queuename}};\n\nset hive.execution.engine=spark;\nSET hive.exec.compress.output=true;\nSET mapred.output.compression.type=BLOCK;\nSET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;\nset hive.auto.convert.join = true;\nset hive.auto.convert.join.noconditionaltask = true;\nset hive.auto.convert.join.noconditionaltask.size = 10000000;\nSET hive.exec.parallel=true;\nset hive.exec.dynamic.partition=true;\nset hive.exec.dynamic.partition.mode=nonstrict;\nset hive.exec.reducers.bytes.per.reducer=536870912;\nset hive.exec.reducers.max=500;\n\nset mapred.min.split.size=67108864;\n\nSET hive.exec.compress.intermediate=true;\nset hive.exec.max.dynamic.partitions.pernode=10000;\nset hive.exec.max.dynamic.partitions=10000;\n\n-- create dispatch tagged users table\nCREATE TABLE IF NOT EXISTS {{dispatch_rider_cancel_daily_table_name}} (\n job_client_uuid string,\n client_vvid string,\n supply_vvid string,\n supply_client_vvid string,\n job_uuid string,\n supply_uuid string,\n vehicle_view_id string,\n city_id string,\n client_uuid string,\n timestamp bigint,\n day_of_week bigint,\n hour_of_day bigint,\n dispatched_eta float,\n app_rating float,\n eyeball_eta_seconds bigint,\n eta_delta bigint,\n surge_multiplier float,\n is_forward_dispatched boolean,\n request_location_latitude float,\n request_location_longitude float,\n dropoff_location_latitude float,\n dropoff_location_longitude float,\n trip_distance_haversine float,\n days_since_signup bigint,\n rider_upfront_fare float,\n is_commute boolean,\n is_rider_canceled_dispatch boolean,\n is_fifo boolean\n)\nPARTITIONED BY (datestr string)\nROW FORMAT SERDE\n 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'\nSTORED AS INPUTFORMAT\n 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'\nOUTPUTFORMAT\n 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'\nTBLPROPERTIES ('dc_replication' = 'false')\n;\n\nWITH a AS\n(\n SELECT a.*,\n ACOS(COS(RADIANS(90-pickup.waypointLocation.latitude))*COS(RADIANS(90-dropoff.waypointLocation.latitude))+\n SIN(RADIANS(90-pickup.waypointLocation.latitude))*SIN(RADIANS(90-dropoff.waypointLocation.latitude))\n *COS(RADIANS(pickup.waypointLocation.longitude-dropoff.waypointLocation.longitude)))*3958.756 AS trip_distance_haversine,\n dropoff.waypointLocation.latitude AS dropoff_location_latitude,\n dropoff.waypointLocation.longitude AS dropoff_location_longitude\n FROM rawdata.kafka_hp_demand_job_assigned_nodedup AS a\n LATERAL VIEW EXPLODE(a.msg.waypoints) exploded AS pickup\n LATERAL VIEW EXPLODE(a.msg.waypoints) exploded2 AS dropoff\n WHERE a.datestr = '{{yesterday_ds}}'\n AND pickup.waypointtasktype = 'pickup'\n AND dropoff.waypointtasktype = 'dropoff'\n)\nINSERT OVERWRITE TABLE {{dispatch_rider_cancel_daily_table_name}} PARTITION(datestr)\nSELECT DISTINCT\n concat(a.msg.jobUUID, a.msg.clientUUID) AS job_client_uuid,\n concat(a.msg.clientUUID, '#', a.msg.vehicleViewId) AS client_vvid,\n concat(a.msg.supplyUUID, '#', a.msg.vehicleViewId) AS supply_vvid,\n concat(a.msg.supplyUUID, '#', a.msg.clientUUID, '#', a.msg.vehicleViewId) AS supply_client_vvid,\n a.msg.jobUUID AS job_uuid,\n a.msg.supplyUUID AS supply_uuid,\n a.msg.vehicleViewId AS vehicle_view_id,\n a.msg.region.id AS city_id,\n a.msg.clientUUID AS client_uuid,\n cast(a.msg.timestamp/1e3 as bigint) as timestamp,\n from_unixtime(cast(a.msg.timestamp/1e3 as bigint),'u') as day_of_week,\n from_unixtime(cast(a.msg.timestamp/1e3 as bigint),'H') as hour_of_day,\n a.msg.predictedETA AS dispatched_eta,\n dd.app_rating AS app_rating,\n a.msg.eyeballETASeconds AS eyeball_eta_seconds,\n a.msg.predictedETA - a.msg.eyeballETASeconds AS eta_delta,\n a.msg.surgeMultiplier AS surge_multiplier,\n a.msg.isForwardDispatched AS is_forward_dispatched,\n a.msg.requestLocation.latitude AS request_location_latitude,\n a.msg.requestLocation.longitude AS request_location_longitude,\n a.dropoff_location_latitude,\n a.dropoff_location_longitude,\n a.trip_distance_haversine,\n GREATEST(DATEDIFF(TO_DATE(FROM_UNIXTIME(cast(a.msg.timestamp/1e3 as bigint))), TO_DATE(dim_client.signup_timestamp)), 0) AS days_since_signup,\n a.msg.upfrontFareInfo.riderUpfrontFareAmount AS rider_upfront_fare,\n a.msg.isCommuterBenefits AS is_commute,\n CASE WHEN rc.msg.jobUUID IS NOT NULL THEN 1 ELSE 0 END AS is_rider_canceled_dispatch,\n a.msg.fifoDispatchType = 'fifo' as is_fifo,\n '{{yesterday_ds}}' AS datestr\nFROM a\n LEFT JOIN rawdata.kafka_hp_demand_job_canceled_nodedup AS rc\n ON a.msg.jobUUID = rc.msg.jobUUID\n AND rc.datestr = '{{yesterday_ds}}'\n LEFT JOIN rawdata.kafka_hp_demand_job_unassigned_nodedup AS du\n ON a.msg.jobUUID = du.msg.jobUUID\n AND du.datestr = '{{yesterday_ds}}'\n AND a.msg.supplyUUID = du.msg.supplyUUID\n INNER JOIN dwh.dim_driver AS dd\n ON a.msg.supplyUUID = dd.driver_uuid\n AND (NOT dd.is_uber_email)\n INNER JOIN dwh.dim_client AS dim_client\n ON a.msg.clientUUID = dim_client.user_uuid\n AND (NOT dim_client.is_uber_email)\nWHERE du.msg.jobUUID IS NULL\n;\n\nALTER TABLE {{dispatch_rider_cancel_daily_table_name}} DROP IF EXISTS PARTITION(datestr='{{macros.ds_add(ds,-40)}}')\n;",
"hivePartitionSensorRetries": 1,
"x": 156.401123046875,
@@ -302,2 +302,2 @@
]
}
Yes this is a feature by our diff parser, we utilized gitdiff-parser
which only accepts a diff format from git
command line. The major differences appear that git diff
prepends 2 lines before normal diff blocks:
diff --git a/fileA b/fileB
index indexA..indexB mode
--- a
+++ b
@@ -1,4 +1,4 @@
...
These 2 lines includes the filename, git commit sha and file mode, both filename and mode are useful to our component.
The simple solution is to mock these informations, here is an example component which accepts the old and new text and displays diff: https://gist.github.com/otakustay/7aac98708aff358b8e59c61e1f765452#file-diff-js-L25
Awesome @otakustay Thank you for your prompt and informative response.
Having the following snippet,
I get error
Uncaught (in promise) TypeError: e.split is not a function
.