smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.42k stars 538 forks source link

Discussions about how to organize further maintenance of this library #286

Open k00ni opened 4 years ago

k00ni commented 4 years ago

Based on the latest commit in master (over a year old) as well as 16 pending pull requests, i assume @smalot is not maintaining this library anymore. That's fine, he will have his reasons.

In this issue i would like to discuss where the community around this library should continue the work? There are some who already developed their own strain, for instance:

Found them using https://github.com/smalot/pdfparser/network

Any ideas?


EDIT: Removed fork from @lausek (ref).

lausek commented 4 years ago

My fork does not contain any changes of value. Sorry.

amooij commented 4 years ago

We actively use this library for internal projects and we've fixed several issues, but we don't have the resources to become the lead maintainer. We are more than willing to share all our work.

On Wed, Apr 15, 2020 at 1:46 PM lausek notifications@github.com wrote:

My fork does not contain any changes of value. Sorry.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/smalot/pdfparser/issues/286#issuecomment-613989287, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADYMTFTFUDQR4OI2X5U75DRMWNDBANCNFSM4MIOUOTA .

k00ni commented 4 years ago

That sounds great! I can support with ~1 hour per week to help with organizing issues, tests and pull requests.

@smalot it would be nice to hear what you plan / think.

NoxxieNl commented 4 years ago

I moved to a paid service setasign.com, does everything I need as a reader and soooo much more, maybe that is an option?

k00ni commented 4 years ago

Thank you for the tip, but i dont think that this is an option for everyone. For myself, the current state of pdfparser works decent, but switching could be an option, sure.

NoxxieNl commented 4 years ago

You could go for the free version that support til pdf v1.4 (everything above is in the paid service). Fpdi is the package, for trading works great and integrates with allot of writer packages also.

Better alternative for my taste and those packages are better maintained, not saying you should move away, but a pdf parser is Allot of work... :)

Just my 2 cents...

smalot commented 4 years ago

Hi all, Indeed, I'm not as available as before. I'll try to validate some PR or make some feedback if needed. Thanks

j0k3r commented 4 years ago

@smalot maybe you can add one or two maintainers so they will ba able to help you.

smalot commented 4 years ago

Hi @j0k3r , I noticed you work at 20 minutes Would you be interesting in working on this project ?

k00ni commented 4 years ago

Dont forget the post from @amooij

Quote:

We actively use this library for internal projects and we've fixed several issues, but we don't have the resources to become the lead maintainer. We are more than willing to share all our work.

smalot commented 4 years ago

That sounds great! I can support with ~1 hour per week to help with organizing issues, tests and pull requests.

@smalot it would be nice to hear what you plan / think.

Hi @k00ni I just sent you an invite to work on this project if you are still interested on

smalot commented 4 years ago

Hi @amooij I just sent too an invite to you if you have time to spend on this library

k00ni commented 4 years ago

@smalot, thank you for the invite.

I saw that you invited further people besides me. How do you want to organize this repository with you and 3+ further maintainers/helpers? Any preferences?

j0k3r commented 4 years ago

Maybe you should first define at least one approval review to merge a PR.

It means at least 1 person have checked the code and if a maintainer submit a PR, at least 2 maintainers have seen the code.

smalot commented 4 years ago

This library is very used and followed. I don't know if TDD is a good solution, but in the past I saw PR with bad approach using strpos instead of preg_match which could break some syntax patterns or some other issues. I tried to cover it with some useful unit tests. So I would appreciate to continue to maintain quality as first goal.

@j0k3r made a good suggestion.

Currently you seems to be more aware about real issues and have more time to spend on this library to make it alive. In my company we use to validate PR with :+1: ou :-1: to indicate if more work is required or if we can merge. Being 3 or 4 can create a useful debate for any evolution.

What do you think about it ?

k00ni commented 4 years ago

I support @j0k3r's suggestions, but it shouldn't matter who made a PR. For me at least the approval (+ review) of 1 maintainer is sufficient.

We should aim for basic but long term oriented "rules", because everyone of us mostly likely has a job and life too. So i would suggest we start with:

Besides these, i would suggest that we try to setup tooling to automate as much as possible. Hound checks PR's and helps with code reviews. @smalot: Can you add support for https://coveralls.io so that we have an overview about the test coverage? Here is a good tutorial: https://kizu514.com/blog/setting-up-coveralls-io-with-travis-ci-and-phpunit/ I am sure there are other tools, which can make our life easier.

My 2 cents, what do you think?

smalot commented 4 years ago

Currently, unit tests are made using Atoum which was quite enough until now. http://atoum.org/ You can run unit test and code coverage using this command line:

./vendor/bin/atoum -d src/Smalot/PdfParser/Tests/

I tried to increase code coverage, but I agree that's not a goal by itself. That's why I've included some pdf sample to test in real conditions. Last reports are available here : https://travis-ci.org/github/smalot/pdfparser

k00ni commented 4 years ago

Is a switch to PHPUnit an option for you @smalot?

j0k3r commented 4 years ago

I'm 👍to move to PHPUnit. Also adding https://scrutinizer-ci.com/ could be a good thing.

smalot commented 4 years ago

I've just reactivated my scrutinizer account: https://scrutinizer-ci.com/g/smalot/pdfparser

don't hesitate if you need specific settings on this tools

smalot commented 4 years ago

No problem to use PHPUnit instead of Atoum, but I'm not sure to have time to work on it. The main advantage I found about Atoum when I choose it, it is really strict about typing. I suppose PHPUnit is too a good tool, but I'm not aware about all its capacities

smalot commented 4 years ago

It should be interesting to move Tests into a specific namespace and folder declared only in "autoload-dev". I let you create a dedicated issue to follow the work

smalot commented 4 years ago

I granted you a "Developer Access" on Scrutinizer. I hope it help you

rubenvanerk commented 4 years ago

Any plans on closing some issues? There are issues that are last updated over 6 years ago and questions for which an answer is no longer needed. I'm interested in helping with some issues but there are a lot and it's not clear which issues need attention and which do not.

k00ni commented 4 years ago

Hi @rubenvanerk, i am still in the process of getting a birds eye view. Therefore i am focusing on recent issues and PRs and trying to get in touch with people to solve things. As you can see, not all responded.

The most important PRs for me are my own (#299, #300) for now, so we have a stable basement in the long run. I also prefer pull requests like #297, where the author is cooperative and helpful.

On the issue side: bug reports and issues about missing functionality should be handled with priority. Questions and everything else is secondary to me, due to lack of time.

If you wanna help it would be cool if you check issues and comment on it with an status update. For instance, an answer is no longer needed, because ... or something like that. @j0k3r or me can later decide how we handle it.

What do you think?

rubenvanerk commented 4 years ago

I think I'm just going to work through the issues from most recent to least recent. I assume you and @j0k3r are monitoring the issues from time to time? Or should I tag you when I think an issue can be closed?

j0k3r commented 4 years ago

We monitor them :)

k00ni commented 4 years ago

@j0k3r and the others, what features and plans do you have or working on? Maybe we can coordinate up front and help each other?

Ref: #306

For me: i am good for now and don't plan anything particular. But i am still available for issues and PRs.

j0k3r commented 4 years ago

I don’t want to work on sth particular. Just wanted to provide enough tools to ease the maintain of the lib (phpunit, cs-fixer, phpstan, etc).

k00ni commented 4 years ago

I wanna make a proposal: If one of the collaborators assigns himself to a PR, he will take care of merge.

Also, a PR should be kept open for a while, so that our community has a chance to comment. Even after it is ready to be merged, a window of 1+ week 2-3 days should be used to allow further comments. Closing it right away prevents that possibility.

Hope that is fine with you guys.

k00ni commented 3 years ago

@amooij I saw that you received collaborator rights but saw no activity from you in the last couple of months, which is totally fine. I propose we limit collaborator rights to active people to keep it simple and organized. What do you think?

CC @smalot

k00ni commented 2 years ago

I just want to let you know, that I wont have much spare time to spend here in the near future. If I can I will help out here and there, but for the most part I am not available.

siims-biz commented 1 year ago

The last comment is from last year. Did the state of information change? @smalot @amooij @k00ni

k00ni commented 1 year ago

In general the state is basically the same, @smalot is not available and we maintain this library in our spare time. Luckily we get pull requests from time to time so there is some development and optimization of the library. I try to help out here and there.

Did you mean something in particular?

jzohrab commented 1 year ago

Hi all here, I’m interested in this library for my php project, and could spend some cycles here. I haven’t even looked at the code, but this project shows up high in google results :-) My initial primary interest would be to add some integration level tests, basically parsing some real PHP files, if that’s not already there. (tests appear to be present, using files in samples/bugs). Then closing old issues because there are some that are very old. For existing maintainers, any suggestions on getting up to speed to become part of the team? Cheers! Jz

k00ni commented 1 year ago

Hello @jzohrab it is good to see interest in helping out here! Let me get you up to speed by introducing you to the current situation. Afterwards I will explain our processes and for what we aiming for.

Overview

Let me give you a short overview how things are going at the moment first. Currently @j0k3r and I are working as maintainers in our spare time, @smalot (owner of this repository) is not available. He gave us enough rights to maintain most of the repository, like merging code, label issues etc. Most of the contributions come from the community on a regular basis: We just released 2.6.0 with further fixes and new functionality (meta data). There is no project plan for this library, things get added/changed, if something breaks or someone wants a certain functionality. All in all, for the future I would aim to make the parser feature complete so it supports latest PDF specifications and fix all the stupid bugs, which there are many currently (many encoding/decoding issues lately).

Contributions

We introduced some standards a while ago, like coding style checks (using PHP-CS-Fixer) and static code analyzer (PHPStan). These and other precautions were intended to aid stable development and avoid some cliffs. All checks must be green for any new pull request. I value clear code, meaningful comments on code, references to specifications and a good test coverage. Usually we receive complex system(?) tests using a PDF file, but a handful of unit tests can also be sufficient sometimes. It depends on the type of contribution. Because this library is developed in no particular direction, it is crucial that at least each addition/change is documented in some way and references prior/related work.

Where to start?

I suggest you start by looking up latest merged pull request and get a feel how we managed them. @GreyWyvern did a good job in #606 for instance. Then its up to you, in my opinion the following areas should be addressed:

Feel free to ask any questions. I try to comment/help out in your issues/pull requests whenever I have the time.

k00ni commented 1 year ago

Big shout out to @GreyWyvern: Thank you very much for all the work you have been doing in the last weeks! I am glad that we have a community here that contributes on a regular basis and is constructive and helpful. :+1:

I wanted to write you via your homepage first but your contact form raised a deprecation warning and stopped working ...?

k00ni commented 9 months ago

@jzohrab How is it going? Are you still interested in helping out here?

jzohrab commented 9 months ago

Hey there @k00ni - thanks for the note, I actually ended up moving my whole project over to python :-) I'm no longer a reliable worker bee here. Cheers and best wishes, thanks for your contributions to the open-source world. :wave:

k00ni commented 9 months ago

@jzohrab I wish you all the best.