kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
116 stars 56 forks source link

`OPDSParser.parse` is always returning true #1099

Open BPerlakiH opened 1 month ago

BPerlakiH commented 1 month ago

Updated:

On the Apple reader side, we would like to be rest assured, that when invalid data is passed into the OPDSParser.parse(data:) function, we can get a false value back, so we can handle it gracefully. We already had a unit test on the reader side, but so far it was disabled.

Currently the libkiwix function never returns false: https://github.com/kiwix/libkiwix/blob/ece40966f1b568ac6ce06c52d146f9f408435810/src/manager.cpp#L167-L179

Whereas on compiled libkiwix header files, I see this information, which is misleading:

  /**
   * Load a library content stored in a OPDS stream.
   *
   * @param content The content of the OPDS stream.
   * @param readOnly Set if the library path could be overwritten later with
   *                 updated content.
   * @param libraryPath The library path (used to resolve relative path)
   * @return True if the content has been properly parsed.
   */
  bool readOpds(const std::string& content, const std::string& urlHost);

So we either need to return False in case of invalid content, or if that's not possible, update the header file to reflect this special case.

kelson42 commented 1 month ago

@BPerlakiH Which exception is thrown? And what exactly is wrong in the OPDS stream to trigger this exception?

veloman-yunkan commented 1 month ago

I think that the bug description is not formulated correctly. I believe that it should rather state that Manager::readOpds() is not fool-proof against malformed input with an example of such input provided. The fix should then come with one or more unit-tests covering such scenarios.

mgautierfr commented 1 month ago

From the screenshot in https://github.com/kiwix/kiwix-apple/issues/851, it seems this is a allocation error (not enough memory ? to much memory allocated ?) inside icu73 (called by kiwix::removeAccents). This may be us doing strange things when trying to remove accents somehow.

But we need to fix this issue first. This is not related to this method always returning true (even if we have to fix this anyway).

mgautierfr commented 1 month ago

I have open the issue https://github.com/kiwix/libkiwix/issues/1101 about the memory allocation error in kiwix::removeAccents.

I keep this issue open about the method always returning true.

kelson42 commented 1 month ago

@BPerlakiH Can you please open the corresponding issue lin this repo to clearly state what does not work?

BPerlakiH commented 1 month ago

Sorry, I also think that the description was misleading. I did update it, to reflect a more narrow scope: the actual return value of this specific libkiwix function, and how it is described in the compiled header file.

kelson42 commented 4 weeks ago

@veloman-yunkan @mgautierfr We need your feedback here. In a way or the other the libkiwix user would benefit to know if the parsing went well or not. The boolean returns seems appropriate for that and this is what the documentation implies. Documentation and testset around this function should probably be updated.

veloman-yunkan commented 3 weeks ago

Currently the libkiwix function never returns false:

https://github.com/kiwix/libkiwix/blob/ece40966f1b568ac6ce06c52d146f9f408435810/src/manager.cpp#L167-L179

The code proves the opposite - Manager::readOpds() returns false in case of malformed (syntactically incorrect) XML. Yet, it does return true if the XML content does not represent valid or reasonable OPDS data, but before fixing anything here we have to define how it should behave in those cases. For example, what should it return when

  1. the XML is a KML file rather than OPDS;
  2. the XML is an OPDS response from a new API endpoint added in a more recent version of kiwix-server;
  3. the XML is an OPDS response generated for example by a malicious or buggy kiwix-server that contains e.g. a string where a number is expected, or contains duplicate entries with the same book ID but different attributes (title, category, etc).
  4. when certain attributes in the OPDS are obviously wrong (e.g. a date string reading "2020-15-48T34:71:95") or unreasonable (a date pointing to 100 years in the future).
  5. and so on.