psolin / cleanco

Company Name Processor written in Python
MIT License
318 stars 94 forks source link

version 2.0 #47

Closed petri closed 4 years ago

petri commented 4 years ago

I also created a new branch from current master, so if we want to make a final release based on old codebase, or if someone wants to contribute to it still, that can be done in the 1.4 branch.

petri commented 4 years ago

@psolin I'm happy to fork & take over full maintenance of this package if you no longer wish to spend any time on it?

psolin commented 4 years ago

It’s been a day since you created this pull request. I’ll take a look and merge it.

psolin commented 4 years ago

Take a look and let me know if this works.

petri commented 4 years ago

I don't see why SQLite or such would be needed for anything, it's a huge overkill. The terms are loaded & processed when application starts and are then kept in memory as regular Python objects. There's just I think three hundred short unique text terms, and then the country names - this is nothing.

psolin commented 4 years ago

I agree. I ended up removing that and just edited the terms in the list.

petri commented 4 years ago

Regarding README, I did not update it yet since I did not add any deprecation warnings of the old way either. I was thinking of suggesting a deprecation in a future minor PR.

If there's something you'd like changed in the PR, please add some review notes in the PR. I can then either make the changes or first explain why it's done this way. I guess that's the whole idea of collaborative PRs. As reviewer & package author, you can then choose to accept or reject the changes. There's no need for the reviewer to spend his time to actually fix anything in a PR - rather, that's the job of the person submitting the PR.

Besides changing README & adding some nice termdata improvements, It seems to me you've also changed the imports in:

cleanco/classify.py cleanco/clean.py cleanco/cleanco.py

Is there a reason for this? Those changes break the package & tests. The setup.py file seems to be changed also. Is that intentional?

psolin commented 4 years ago

You can revert those import lines. I was having issues running the package, but I think that I needed to do it in a virtualenv instead of how I was doing it. Everything did check out from my testing. I'm not sure what happened with setup.py, but that can be reverted as well.

Check the readme edits that I made and let me know if that write up makes sense at all.

petri commented 4 years ago

Ok given your ok for the PR content, I just first applied your termdata changes to master and then committed my original 2.0 PR changes on top of it. So this PR can be just closed & forgotten.

If we want to deprecate the old class-based approach, it's easy to do now whenever you want.

petri commented 4 years ago

The 2.0 branch can be deleted, unless we want to save your README changes from there first.