nicholasjhorton / lost-jse-issues

Repository for the retrieval and curation of the lost Journal of Statistics Education (now JSDSE) volumes 8 and 9
MIT License
0 stars 0 forks source link

address wayback machine headers in html files #5

Closed nicholasjhorton closed 3 years ago

nicholasjhorton commented 3 years ago

At present, the downloaded html files have a header of varying number of lines which is inserted by the wayback machine.

Let's use Jim Albert's paper from volume 8, issue 1 as an example: https://github.com/nicholasjhorton/lost-jse-issues/blob/main/papers/albert_8_1.html

<!-- saved from url=(0104)https://web.archive.org/web/20130307064027/http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm -->
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><script src="./albert_8_1_files/analytics.js" type="text/javascript"></script>
<script type="text/javascript">window.addEventListener('DOMContentLoaded',function(){var v=archive_analytics.values;v.service='wb';v.server_name='wwwb-app12.us.archive.org';v.server_ms=179;archive_analytics.send_pageview({});});</script>
<script type="text/javascript" src="./albert_8_1_files/bundle-playback.js" charset="utf-8"></script>
<script type="text/javascript" src="./albert_8_1_files/wombat.js" charset="utf-8"></script>
<script type="text/javascript">
  __wm.init("https://web.archive.org/web");
  __wm.wombat("http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm","20130307064027","https://web.archive.org/","web","/_static/",
          "1362638427");
</script>
<link rel="stylesheet" type="text/css" href="./albert_8_1_files/banner-styles.css">
<link rel="stylesheet" type="text/css" href="./albert_8_1_files/iconochive.css">
<!-- End Wayback Rewrite JS Include -->
<title>Journal of Statistics Education, V8N1: Albert</title></head>
<body bgcolor="#ffffff" data-new-gr-c-s-check-loaded="14.1036.0" data-gr-ext-installed=""><!-- BEGIN WAYBACK TOOLBAR INSERT -->
<style type="text/css">
body {
  margin-top:0 !important;
  padding-top:0 !important;
  /*min-width:800px !important;*/
}
</style>
<script>__wm.rw(0);</script>
<div id="wm-ipp-base" lang="en" style="display: block; direction: ltr;">
</div><div id="wm-ipp-print">The Wayback Machine - https://web.archive.org/web/20130307064027/http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm</div>
<div id="donato" style="position:relative;width:100%;">
  <div id="donato-base">
    <iframe id="donato-if" src="./albert_8_1_files/donate.html" scrolling="no" frameborder="0" style="width:100%; height:100%">
    </iframe>
  </div>
</div><script type="text/javascript">
__wm.bt(650,27,25,2,"web","http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm","20130307064027",1996,"/_static/",["/_static/css/banner-styles.css?v=omkqRugM","/_static/css/iconochive.css?v=qtvMKcIJ"], "False");
  __wm.rw(1);
</script>
<!-- END WAYBACK TOOLBAR INSERT -->

<h1>Using a Sample Survey Project to Assess the Teaching of Statistical Inference</h1>

I think that it would be preferable to remove most of these lines.

The first line is pretty useful as it documents where we got this from if someone wants to go spelunking:

<!-- saved from url=(0104)https://web.archive.org/web/20130307064027/http://www.amstat.org/publications/jse/secure/v8n1/albert.cfm -->

Buried later in there is a line:

<title>Journal of Statistics Education, V8N1: Albert</title></head>

We want to keep this.

Lastly, we want to keep the title (formatted as h1) of the paper.

<h1>Using a Sample Survey Project to Assess the Teaching of Statistical Inference</h1>
sili22 commented 3 years ago

Finished editing the wayback machine headers in HTML files for volume 8

nicholasjhorton commented 3 years ago

Thanks! This looks great.