internetarchive / heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
https://heritrix.readthedocs.io/
Other
2.77k stars 757 forks source link

How to cite? #463

Closed Querela closed 1 year ago

Querela commented 2 years ago

Hi,

I'm writing a paper and would like to cite heritrix. What/Who/... do I need to reference? I found the following:

Suggestion: add a section with bibtex code to either the readme or the wiki for future users (maybe here).

ato commented 2 years ago

Feel free to send a pull request updating the README or documentation as you feel would be appropriate.

I'm no academic and don't know what the normal practice is but citing that paper seems reasonable to me and looking on Google Scholar it does seem to be what most people reference.

cgr71ii commented 2 years ago

I agree that I'd be useful to have a reference somewhere in the github page. The paper you mentioned is a little bit old (I haven't read it, but I'm sure that doesn't reflect the current status of Heritrix), but it seems to be the official, and 2021 and 2022 papers cited it. Have you found another reference not that old?

Anyway, here is the bibtext (I'm not sure if you asked for it or just asked for other reference as I've pointed out):

@inproceedings{mohr2004introduction,
  title={Introduction to heritrix},
  author={Mohr, Gordon and Stack, Michael and Rnitovic, Igor and Avery, Dan and Kimpton, Michele},
  booktitle={4th International Web Archiving Workshop},
  pages={109--115},
  year={2004},
  organization={Citeseer}
}