Open hugolpz opened 3 years ago
Hello @sffc . I noticed you made some py change https://github.com/google/corpuscrawler/commit/10adaecf4ed5a7d0557c8e692c186023746eb001 and are active on this project, so allow me to cc you on this minor issue.
The project is currently structured as a PIP module, and it should stay a PIP module. However I would support reorganizing the utilities and crawlers into separate directories, but more along the lines of:
corpuscrawler
├─ README.md
├─ LICENSE
├─ LICENSE.md
├─ CONTRIBUTING.md
├─ corpuscrawler
└─ Lib
└─ corpuscrawler
├─ util
| └─ *.py: utilities
└─ crawlers
└─crawl_{iso}.py : crawlers
This would add clarity yes. This current project lacks clear on-boarding manuals and pointers. A clean structure splitting the few utils from the 1000+ crawlers files would be an improvement for clarity and on-boarding.
Related to #80. Suggestion. Mainly, move the core codes up so it is more visible. The crawlers are kept into their own folder.
to
Would such changes disturb some complementary toolchain ?