phillipcheng / log.analysis

1 stars 15 forks source link

attmexico spark workflow improvements #310

Closed hanzac closed 7 years ago

hanzac commented 7 years ago
  1. Save as parquet still has problem.
  2. CombineXMLInput cross file
  3. HuaweiXml2CsvCmd under attmexico please help to use SAX instead of XPATH to reduce memory consumption
  4. Since we need to submit spark job directly to yarn instead of let oozie do it (since we changed the shuffle service, so oozie can’t work with spark shuffle)