harshankur / officeParser

A Node.js library to parse text out of any office file. Currently supports docx, pptx, xlsx and odt, odp, ods..
MIT License
123 stars 17 forks source link

Can't parse OpenDocument text file #2

Closed SoftwareIISGubbio closed 1 year ago

SoftwareIISGubbio commented 4 years ago

I am trying to read an odt file (no problem with docx), using the first example in home page:

const officeParser = require('officeparser');
officeParser.parseOpenOffice("example.odt", function(data, err){
  if (err) return console.log(err);
  console.log(data)
});

I obtain this error:

[Error: ENOENT: no such file or directory, open 'officeDist/Configurations2/progressbar/'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'open',
  path: 'officeDist/Configurations2/progressbar/'
}
[Error: ENOENT: no such file or directory, open 'officeDist/Configurations2/progressbar/'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'open',
  path: 'officeDist/Configurations2/progressbar/'
}

(two times the same).

It seems that empty folders (such as "progressbar") was not created during decompression.

in "officeDist/Configurations2" there is only one directory: "images" (empty)

if I decompress the same odt file using unzip I can see

accelerator/
floater/
images/
menubar/
popupmenu/
progressbar/
statusbar/
toolbar/
toolpanel/

I did the same test in MacOS and Linux with the same result.

thank you Edoardo

SoftwareIISGubbio commented 4 years ago

I hope this can help

$ npm list
finder@1.0.0 /tmp/finder
└─┬ officeparser@2.2.2
  ├─┬ decompress@4.2.1
  │ ├─┬ decompress-tar@4.1.1
  │ │ ├── file-type@5.2.0
  │ │ ├── is-stream@1.1.0
  │ │ └─┬ tar-stream@1.6.2
  │ │   ├─┬ bl@1.2.2
  │ │   │ ├── readable-stream@2.3.7 deduped
  │ │   │ └── safe-buffer@5.2.1
  │ │   ├─┬ buffer-alloc@1.2.0
  │ │   │ ├── buffer-alloc-unsafe@1.1.0
  │ │   │ └── buffer-fill@1.0.0
  │ │   ├─┬ end-of-stream@1.4.4
  │ │   │ └── once@1.4.0 deduped
  │ │   ├── fs-constants@1.0.0
  │ │   ├─┬ readable-stream@2.3.7
  │ │   │ ├── core-util-is@1.0.2
  │ │   │ ├── inherits@2.0.4 deduped
  │ │   │ ├── isarray@1.0.0
  │ │   │ ├── process-nextick-args@2.0.1
  │ │   │ ├── safe-buffer@5.1.2
  │ │   │ ├─┬ string_decoder@1.1.1
  │ │   │ │ └── safe-buffer@5.1.2
  │ │   │ └── util-deprecate@1.0.2
  │ │   ├── to-buffer@1.1.1
  │ │   └── xtend@4.0.2
  │ ├─┬ decompress-tarbz2@4.1.1
  │ │ ├── decompress-tar@4.1.1 deduped
  │ │ ├── file-type@6.2.0
  │ │ ├── is-stream@1.1.0 deduped
  │ │ ├─┬ seek-bzip@1.0.6
  │ │ │ └── commander@2.20.3
  │ │ └─┬ unbzip2-stream@1.4.3
  │ │   ├─┬ buffer@5.6.0
  │ │   │ ├── base64-js@1.3.1
  │ │   │ └── ieee754@1.1.13
  │ │   └── through@2.3.8
  │ ├─┬ decompress-targz@4.1.1
  │ │ ├── decompress-tar@4.1.1 deduped
  │ │ ├── file-type@5.2.0 deduped
  │ │ └── is-stream@1.1.0 deduped
  │ ├─┬ decompress-unzip@4.0.1
  │ │ ├── file-type@3.9.0
  │ │ ├─┬ get-stream@2.3.1
  │ │ │ ├── object-assign@4.1.1
  │ │ │ └─┬ pinkie-promise@2.0.1
  │ │ │   └── pinkie@2.0.4
  │ │ ├── pify@2.3.0 deduped
  │ │ └─┬ yauzl@2.10.0
  │ │   ├── buffer-crc32@0.2.13
  │ │   └─┬ fd-slicer@1.1.0
  │ │     └── pend@1.2.0
  │ ├── graceful-fs@4.2.4
  │ ├─┬ make-dir@1.3.0
  │ │ └── pify@3.0.0
  │ ├── pify@2.3.0
  │ └─┬ strip-dirs@2.1.0
  │   └── is-natural-number@4.0.1
  ├─┬ rimraf@2.7.1
  │ └─┬ glob@7.1.6
  │   ├── fs.realpath@1.0.0
  │   ├─┬ inflight@1.0.6
  │   │ ├── once@1.4.0 deduped
  │   │ └── wrappy@1.0.2
  │   ├── inherits@2.0.4
  │   ├─┬ minimatch@3.0.4
  │   │ └─┬ brace-expansion@1.1.11
  │   │   ├── balanced-match@1.0.0
  │   │   └── concat-map@0.0.1
  │   ├─┬ once@1.4.0
  │   │ └── wrappy@1.0.2 deduped
  │   └── path-is-absolute@1.0.1
  └─┬ xml2js@0.4.23
    ├── sax@1.2.4
    └── xmlbuilder@11.0.1
eerFun commented 3 years ago

I have the same problem when I'm trying to read any LibreOffice file (e.g. .odt, .odp, and .ods) in Linux Mint 13.2:

[Error: EISDIR: illegal operation on a directory, open 'officeDist/Configurations2/accelerator/']: { 
  code: 'EISDIR',
  errno: -21,
  message: 'EISDIR: illegal operation on a directory, open 'officeDist/Configurations2/accelerator/'',
  path: 'officeDist/Configurations2/accelerator/',
  stack: 'Error: EISDIR: illegal operation on a directory, open 'officeDist/Configurations2/accelerator/'',
  syscall: 'open' 
}

Also, there is only one directory images, which is empty, in officeDist/Configurations2.

harshankur commented 3 years ago

Also, there is only one directory images, which is empty, in officeDist/Configurations2.

officeDist is filled during the processing and rimraf cleans it up after it is done. You will need to pause the processing or enter debug mode to find out how officeDist is filling up.

I will check these out and update the repository.

If you get a bugfix, please update the repo and send me a pull request. It would be beneficial for a lot of people!

eerFun commented 3 years ago

officeDist is filled during the processing and rimraf cleans it up after it is done. You will need to pause the processing or enter debug mode to find out how officeDist is filling up.

It seems the problem is with decompress module. But, I'm not able to find out where the problem is.

harshankur commented 2 years ago

Could you please share your sample odt file for me to recreate this issue? @eerFun

eerFun commented 2 years ago

Yes, of course. Here you are. text.odt

zhdan88vadim commented 2 years ago

Hi, I found quick solution. In my case I can add filter to decompress function and it work. Further on the code I see that only one file is used content.xml. Checked it on .odt file.

/node_modules/officeparser/officeParser.js

decompress(filename, decompressLocation, {
                filter: function (x) {
                      return x.path === 'content.xml';
                }
            }).
    if (validateFileExtension(filename, ["odt", "odp", "ods"])) {
        try {
            decompress(filename, decompressLocation, {
                filter: function (x) {
                     return x.path === 'content.xml';
                }
            }).then(files => {
                myTextOpenOffice = [];
                if (fs.existsSync(decompressLocation + "/content.xml")) {
                    fs.readFile(decompressLocation + '/content.xml', 'utf8', function (err, data) {
                        if (err) {
                            if (outputToConsoleWhenRequired) console.log(err);
                            return callback(undefined, err);
                        }
xiconet commented 1 year ago

Same kind of issue here, on Linux mint 21. @zhdan88vadim Could you please be so kind to specify if you add or replace code, and where exacly ?

harshankur commented 1 year ago

@SoftwareIISGubbio @eerFun @zhdan88vadim @xiconet I am extremely sorry for not taking care of this issue for a long time. I have pushed a new version 3.0.0 with a lot of code improvements and improved error handling. Please update here if the issue persists.