yobix-ai / extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Apache License 2.0
448 stars 17 forks source link

ISSUE#3: Implemented Tika Metadata #11

Closed s4zuk3 closed 2 weeks ago

s4zuk3 commented 1 month ago

Hello! This is my first time contributing to a public repository, and it’s also my first time using Rust. I hope you find the changes satisfactory. I found your repository very interesting, as I use Tika a lot but I don't like having to depend on Java for its use.

I need the metadata that Tika provides, so I made the necessary changes to implement it in this repository, trying to modify as little as possible. From what I understood of the code, the metadata was already being delivered, it just wasn't fully captured and passed through the binding, so theoretically, the performance should remain the same.

Thanks!

nmammeri commented 1 month ago

Thanks Francisco for putting the effort to make this pull request. It's big feature and will require upgrading the 0.2. I was planning to do this myself but It's great to see that you jumped on it.

I'll need sometime to look into this.

Is it ok if I make changes straight to your branch?

s4zuk3 commented 1 month ago

Thanks Francisco for putting the effort to make this pull request. It's big feature and will require upgrading the 0.2. I was planning to do this myself but It's great to see that you jumped on it.

I'll need sometime to look into this.

Is it ok if I make changes straight to your branch?

Sure, please make any necessary changes! Thank you very much!

s4zuk3 commented 1 month ago

@nmammeri Hey! is there anything missing or can I help you with something to merge this PR?

nmammeri commented 1 month ago

Sorry I didn't have much time to look into it. Thanks again for you work.

Please let me know if you can make those changes. Many thanks

s4zuk3 commented 2 weeks ago

I will create a new updated PR.