internetarchive / Zeno

State-of-the-art web crawler 🔱
GNU Affero General Public License v3.0
64 stars 8 forks source link

Add PDF outlinks extraction #11

Open CorentinB opened 2 years ago

Qu-Ack commented 1 month ago

I am trying to work on this issue and have successfully setup the codebase locally, just need a little help understanding the codebase and where would I add this feature?

CorentinB commented 1 month ago

I am trying to work on this issue and have successfully setup the codebase locally, just need a little help understanding the codebase and where would I add this feature?

Hi @Qu-Ack! I'm so glad you want to contribute! Sadly right now we are in the middle of a big rewrite with many branches fixing many bugs not merged together yet, and the latest main commit has some bugs in it. We are working very hard to fix all of that in the next 2 weeks. If you are still interested in helping in 2 weeks, comment here again and we will see what we can do!

Qu-Ack commented 1 month ago

Thank you for responding, I will reach out to you in 2 weeks, if you need any help with the rewrite or want to pass on the boring stuff to someone else, I am open to help.

CorentinB commented 1 month ago

Hi @Qu-Ack did you have an idea for which lib you would use? I'm looking around and can't find a good one.