Open carlosparadis opened 8 years ago
Am 17/04/2016 um 01:42 schrieb Carlos Andrade:
I am trying to find something that can download lists not in gmane but contain mbox, for instance Apache Software Foundation provides these for virtually all projects. Although I am finding dozens of parsers for a mbox folder, I am yet to find a script that downloads from http to a mbox folder such as codeface/R/ml/download.r does for gmane through nntp-pull.
@wolfgangmauerer https://github.com/wolfgangmauerer do you have any ideas on this? I will post here if I find something on the meantime.
This problem cannot really be solved in general; there are many many web frontends that expose mailing list archives, and a tool to download from all these would have to provide scrapers for every website. The transport protocol (http) would be the only shared thing.
As for Storm, the project seems to be using mod_mbox, which is an Apache http plugin that provides a web frontend based on mbox files. One option would be to use one of the web scraping frameworks to obtain a list of messages, and then use mod_mbox's capability to generate raw files for single messages. The better alternative in this case, I guess, is to just ask the maintainers if they can directly provide the mbox files.
Best regards, Wolfgang
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/siemens/codeface/issues/47
Interesting. I think the only quick way around it for Apache available then is MetricGrimore tool that requests a url. Sadly they pre-fill a database instead of dumping a folder.
Thanks for clarifying!
Am 17/04/2016 um 14:01 schrieb Carlos Andrade:
Interesting. I think the only quick way around it for Apache available then is MetricGrimore https://github.com/MetricsGrimoire/MailingListStats tool that requests a url. Sadly they pre-fill a database instead of dumping a folder.
MetricsGrimoire supports (judging from a cursory glance) a very simple web scraper that can download all files that are linked from one main page; I would be astonished if this suffices for Storm. You could use wget -r for the same purpose.
Thanks for clarifying!
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/siemens/codeface/issues/47#issuecomment-211007061
I am trying to find something that can download lists not in gmane but contain mbox, for instance Apache Software Foundation provides these for virtually all projects. Although I am finding dozens of parsers for a mbox folder, I am yet to find a script that downloads from http to a mbox folder such as codeface/R/ml/download.r does for gmane through nntp-pull.
@wolfgangmauerer do you have any ideas on this? I will post here if I find something on the meantime.