jakubbartel / keboola-xls2csv-processor

Keboola Component for converting xls/xlsx files to csv files.
MIT License
0 stars 1 forks source link

Support multiple sheets at once and name files based on sheets #6

Open Vfisa opened 6 years ago

Vfisa commented 6 years ago

Could you please support these new features to allow us to digest automated reports which have 40+ sheets with the same structure? We have done little bit more advanced version of this in python: https://bitbucket.org/chanleoc/kbc_multi_xlsx_to_csv/src/master/src/

The XLS file has 40+ sheets with the same structure and each one is named based on the ID of the location.

This is the final config we use in Keboola then:

{
  "parameters": {
    "bucket": "client-bucket,
    "key": "pnl.xlsx",
    "includeSubfolders": false,
    "newFilesOnly": false
  },
  "processors": {
    "after": [
      {
        "definition": {
          "component": "leochan.processor-break-up-xlsx-sheets"
        }
      },
      {
        "definition": {
          "component": "keboola.processor-move-files"
        },
        "parameters": {
          "direction": "tables",
          "addCsvSuffix": false,
          "folder": "pnl"
        }
      },
      {
        "definition": {
          "component": "keboola.processor-create-manifest"
        },
        "parameters": {
          "delimiter": ",",
          "enclosure": "\"",
          "incremental": false,
          "primary_key": [],
          "columns": [
            "fact",
            "P1",
            "P2",
            "P3",
            "P4",
            "P5",
            "P6",
            "P7",
            "P8",
            "P9",
            "P10",
            "P11",
            "P12",
            "total",
            "percentage"
          ]
        }
      },
      {
        "definition": {
          "component": "keboola.processor-skip-lines"
        },
        "parameters": {
          "lines": 9,
          "direction_from": "top"
        }
      },
      {
        "definition": {
          "component": "keboola.processor-skip-lines"
        },
        "parameters": {
          "lines": 5,
          "direction_from": "bottom"
        }
      },
      {
        "definition": {
          "component": "keboola.processor-add-filename-column"
        },
        "parameters": {
          "column_name": "s3_filename"
        }
      }
    ]
  }
}

The end result is one merged table with "filename" column which matches the sheet name. In this way we do not have to change anything if we add new location, it will be simply added in.

After speaking with Peter, we were asked to send you this request to prevent having two processors for XLS-CSV in keboola. At this point we keep ours as hidden and try to kindly ask you if you can implement those missing features. Many thanks! Fisa (Keboola Canada)

Vfisa commented 6 years ago

PING

jakubbartel commented 5 years ago

@Vfisa Hi, my apologies for not responding, no notification hit me :( Do you still want it to be implemented?

Vfisa commented 5 years ago

Hi,

yes, it would be great to have the same functionality as in this processor:

https://bitbucket.org/chanleoc/kbc_multi_xlsx_to_csv/src/9eb9678fe168429757fb29dc77f3900b597a13f9?at=master

(the ability to split XLS per tab and get all tabs at once).


Martin Fiser

/Sent from a mobile device

On Dec 1, 2018, at 10:48, jakubbartel notifications@github.com<mailto:notifications@github.com> wrote:

@Vfisahttps://github.com/Vfisa Hi, my apologies for not responding, no notification hit me :( Do you still want it to be implemented?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jakubbartel/keboola-xls2csv-processor/issues/6#issuecomment-443448814, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AClrus8iiQ7tpykOaLXfyz-8x7gpe1o7ks5u0s8UgaJpZM4VLnHJ.

jakubbartel commented 5 years ago

Great, I'm going to handle it in following days/weeks. I will let you know when it's ready for testing.